THRIIE vot 62, no. 10, paar 
BELL SYSTEM 
TIECIINICAL JOURNAL 


COMPUTING SCIENCE AND SYSTEMS 


Theory of Program Testing—An Overview 3073 
R. E. Prather 
Parallel Fault Simulation Using Distributed Processing 3107 


Y. H. Levendel, P. R. Menon, and S. H. Patel 


Two New Kinds of Biased Search Trees 3139 
J. Feigenbaum and R. E. Tarjan 


An Algebraic Theory of Relational Databases 3159 
T. T. Lee 

Generation of Syntax-Directed Editors With Text-Oriented 3205 

Features 


B. A. Bottos and C. M. R. Kintala 


Performance Analysis of a Preemptive Priority Queue With 3225 
Applications to Packet Communication Systems 
M. G. Hluchyj, C. D. Tsao, and R. R. Boorstyn 


THE BELL SYSTEM TECHNICAL JOURNAL 


ADVISORY BOARD 


D. E. PROCKNOW, President, Western Electric Company 
1. M. ROSS, President, Bell Telephone Laboratories, Incorporated 
W. M. ELLINGHAUS, President, American Telephone and Telegraph Company 


EDITORIAL COMMITTEE 


A. A, PENZIAS, Chairman, M. M. BUCHNER, JR., R. P. CLAGETT, B. R. DARNALL, 
B. P. DONOHUE, lil, |. DORROS, S. HORING, R. A. KELLEY, R. W. LUCKY, R. L. MARTIN, 
J. S. NOWAK, G. SPIRO, and J. w. TIMKO 


TECHNICAL EDITORIAL BOARD 


M. D. McILROY, Technical Editor, A. V. AHO, D. L. BAYER, W. FICHTNER, L. E. GALLAHER, 
R. W. GRAVES, M. G. GRISHAM, B. W. KERNIGHAN, Y. E. LIEN, S. G. WASILEW, and Ss. J. YUILL 


EDITORIAL STAFF 


B. G. KING, Editor, PIERCE WHEELER, Managing Editor, LOUISE S. GOLLER, Assistant Editor, 
H. M, PURVIANCE, Art Editor, and B. G. GRUBER, Circulation 


THE BELL SYSTEM TECHNICAL JOURNAL (ISSN0005-8580) is published by the American 
Telephone and Telegraph Company; 195 Broadway, N.Y., N.Y. 10007, C. L. Brown, Chairman 
and Chief Executive Officer; W. M. Ellinghaus, President; V. A. Dwyer, Vice President and 
Treasurer; T. O. Davis, Secretary. 


The Journal is published in three parts. Part 1, general subjects, is published ten times each 
year. Part 2, Computing Science and Systems, and Part 3, single-subject issues, are published 
with Part 1 as the papers become available. 


The subscription price includes all three parts. Subscriptions: United States—1 year $35; 2 years 
$63; 3 years $84; foreign—1 year $45; 2 years $73; 3 years $94. Subscriptions to Part 2 only are 
$10 ($11 foreign). Single copies of the journal are available at $5 ($6 foreign). Payment for 
foreign subscriptions or single copies must be made in United States funds, or by check drawn 
on a United States bank and made payable to The Bell System Technical Journal and sent to 
Bell Laboratories, Circulation Dept., Room 1£-335, 101 J. F. Kennedy Parkway, Short Hills, N. J. 
07078. 


Single copies of material from this issue of The Bell System Technical Journal may be reproduced 
for personal, noncommercial use. Permission to make multiple copies must be obtained from 
the editor. 


Comments on the technical content of any article or brief are welcome. These and other 
editorial inquiries should be addressed to the Editor, The Bell System Technical Journal, Bell 
Laboratories, Room 1J-319, 101 J. F. Kennedy Parkway, Short Hills, N. J. 07078. Comments and 
inquiries, whether or not published, shall not be regarded as confidential or otherwise restricted 
in use and will become the property of the American Telephone and Telegraph Company. 
Comments selected for publication may be edited for brevity, subject to author approval. 


Printed in U.S.A. Second-class postage paid at Short Hills, N. J. 07078 and additional mailing 
offices. Postmaster: Send address changes to The Bell System Technical Journal, Room 1E-335, 
101 J. F. Kennedy Parkway, Short Hills, N. J. 07078. 


© 1983 American Telephone and Telegraph Company. 


THE BELL SYSTEM 
TECHNICAL JOURNAL 


DEVOTED TO THE SCIENTIFIC AND ENGINEERING 
ASPECTS OF COMPUTING 


Volume 62 December 1983 Number 10, Part 2 


Theory of Program Testing—An Overview 


By R. E. PRATHER* 
(Manuscript received January 18, 1983) 


In this paper, we provide a detailed survey of the various approaches to 
program testing that have been proposed in recent years. Particular attention 
is given to a discussion of the developing theory of program testing and to the 
decomposition of the testing problem into the program graph construction, 
test path selection, and test case generation phases. Examples are included to 
illustrate the different testing strategies. Comparisons are made from one 
method to another, all in a uniform terminology and notation, to facilitate an 
understanding of various combinations of strategies that might lead to a more 
workable testing methodology. 


I. INTRODUCTION 


The general goal of software testing is to affirm the quality of a 
program through systematic exercising of the code in a carefully 
controlled environment. The execution of a program test scheme 
should validate an expected prespecified behavior, ideally serving to 
demonstrate the absence of program errors. Considering the difficulty 
of obtaining actual proofs of program correctness, program testing 
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may be the only effective means for assuring the quality of software 
systems of nontrivial complexity. 

The state of the art in software testing as of a decade ago is broadly 
surveyed in the book by Hetzel,' representing an ad hoc approach at 
best. During the intervening years, computer programming method- 
ology has made great strides toward improving the quality of our 
product. And yet, software testing has remained a kind of “black art”, 
only vaguely understood by its practitioners. Happily, this situation is 
changing. The development of the beginnings of a theory of testing 
are well under way, and the more recent literature shows great promise 
for brighter days ahead. Some of these ideas are discussed in a new 
book by Myers,” and further elaboration can be found in the survey 
papers by Miller.*® 

In this overview, we summarize in detail the more recent literature 
on software testing and present the more important results in a 
uniform framework, style, and notation. We hope that this perspective 
will help to focus attention on the more viable alternatives and to 
point the way toward the most promising directions for future research 
and development. 


Il. GENERAL THEORY—THE FUNCTIONAL APPROACH 


The first attempt to describe a generalized theory of testing is found 
in the work of Goodenough and Gerhart,’® A related study is that of 
Hamlet.® In the former, a program is viewed as a function F:D —> R 
over an input domain D with values in an output range R. The program 
specification can also be viewed as a function G:D — R, whether 
completely specified or not. For testing purposes, we must compare 
F(d) with G(d) for selected inputs d in D. Though such an exhaustive 
test is not feasible in general, we say that the program F is correct if 
we have 


F(d) = G(d) (for all d in D), 


recognizing that this is simply a theoretical notion, one not necessarily 
capable of direction verification. 

In any practical setting, we will only be able to examine the behavior 
of the program for a few selected input values. Realizing this, we say 
that a test for the program F is a (finite) subset T of D. Recalling the 
‘goal of software testing,’ T' is said to be an ideal test (for F) if 


success( 7") = correct(F), 


i.e., if F(t) = G(t) for t in T implies the same for all t in D. Note that 
the successful execution of an ideal test would constitute a proof of 
correctness. Given the difficulty in finding proofs of correctness, 
however, we should not be surprised to learn that ideal tests, in this 
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sense, are difficult to discover. (We note that the ‘trivial’ ideal test, 
the exhaustive one with T = D, though easily stated is ordinarily 
unmanageable in size.) 

As a matter of fact, we would prefer not to ‘discover’ our tests at all, 
but to have them ‘selected’ on the basis of some sensible criterion. 
Formally, a test selection criterion (for a program F) is a (true-false) 
predicate C over the subsets of D. Following Goodenough and Gerhart 
once again, such a criterion C is reliable (for F) if 


C(T1) and C(T2) = success(T1) = success(T2), 
and, on the other hand, C is said to be valid (for F) if 
~correct(F) => ~success(T') 


for some T satisfying C(T’). In general, reliability refers to the con- 
sistency with which results will be produced within the selection 
criterion, whereas validity refers to the ability to produce meaningful 
results, regardless of their consistency. 

It is clear that these notions of reliability and consistency are quite 
strong. Perhaps the most convincing statement to this effect is given 
by the following: 

Theorem (Goodenough and Gerhart): If C is reliable and valid, then 
C(T) implies that T is an ideal test. 

On the other hand, Weyuker and Ostrand?® have argued that these 
notions are not strong enough, referring as they do to a particular 
program. If the same ideas are extended, however, so as to apply 
“uniformly” over all programs F, then one obtains the following: 


Theorem (Weyuker and Ostrand): If C is uniformly reliable and uni- 
formly valid, then C(T) implies that T = D, 1.e., T is an exhaustive test. 


Surely this carries the original ideas too far. And in fact, the theorem 
can be understood to say, “If nothing is known about the errors in the 
program, a test criterion is guaranteed ideal (in the sense of Gooden- 
ough and Gerhart) if and only if it selects the entire input domain.” 
What is probably needed to arrive at a more practical alternative is a 
weakening of the Goodenough and Gerhart theory. This is the general 
thrust of Hamlet’s work, but results along these lines thus far are less 
than satisfactory, showing perhaps more promise toward applications 
to program maintenance than to testing. The interested reader should 
consult Ref. 9 for details. 

A test selection criterion C should outline the properties of a 
program that must be exercised to constitute a “thorough” test, ideally 
one whose successful execution implies an error-free program. Follow- 
ing Goodenough and Gerhart once again, we may suppose that C is 
described as a finite set {c} of test predicates (i.e., logical conditions on 
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the input data), and we then choose T subject to the condition(s): 


for all c in C, there is t in T with c(t) 


CT) = for all ¢ in T', there is c in C with c(t). 


(*) 
In words, every test predicate belonging to C should be satisfied by at 
least one test datum ¢ in 7, and conversely, every ¢t in T' must satisfy 
at least one test predicate. 

It is suggested that the test predicates be derived from the program 
specifications—this is the essence of the functional approach (or “black 
box” approach) to testing. But the claim is made’ that to have a 
reasonable chance of constituting a reliable criterion, C must be 
composed of test predicates satisfying (at least) the following set of 
conditions: 

Condition 1: Every individual branching condition in the program 
must be represented by an equivalent test predicate. 

Condition 2: Every potential termination condition (e.g., error, 
overflow, etc.) must be represented by a corresponding test predicate. 

Condition 3: The range of every variable appearing in a test predi- 
cate must be partitioned into classes that are “treated in the same 
way” by the program. 

Condition 4: Every condition relevant to the proper functioning of 
the program that is implicit in the program specification or of one’s 
knowledge of the program must be represented by a corresponding 
test predicate. 

Condition 5: The test predicates must be “independent,” in that all 

data satisfying a particular test predicate must exercise the same path 
in the program and must test the same branch conditions. 
We note that only the second and fourth of these conditions are of a 
“functional” nature. The others are “structural,” that is, relating more 
to the topology of the underlying flowchart. It would seem, therefore, 
that any reasonable testing strategy should address both points of 
view. 

Consider the following example, the often cited problem of classi- 
fying triangles: 

Specification: 

Input: Three positive integers a 2b2 c. 

Output: An indication as to whether: 

. They do not represent the sides of a triangle 

. They are the sides of an equilateral triangle 

. They are the sides of an isosceles triangle 

. They are the sides of a scalene right triangle 

. They are the sides of a scalene obtuse triangle 

. They are the sides of a scalene acute triangle. 

is problem is especially well suited to the “functional” approach. 


aor WN EH 


T 


a 
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Since the whole purpose of the problem is to classify its input 
domain, there is an obvious specification-based derivation of test 
predicates. We may first divide the universe of triples (a, 6, c) into 
legal and illegal forms: 


(a 2 b) and (b 2 c) 
T F 


legal cl: illegal 
For the legal entries, we may further distinguish two cases: 
(a2 b) and (b2Zc) 


T 
aZzb+c 
ywooN 
c2: not-a triangle triangle 


and the triangles may then be subdivided into six subclasses: 


c3:(a=b)and(b=c) equilateral 

c4:(a=b)and(b>c)_ isosceles 

c5:(a>b) and (b=c) and(a<b+c) _ isosceles 

c6: (a> 6) and (b> c) and (axa = b*b + cxc) right scalene 

c7:(a>b) and (b> c) and (axa < b*b + c#c) acute scalene 

c8:(a>b) and (b> c) and (axa > b*b + cxc) and (a<b+c) 
obtuse scalene. 


If we set C = {c(z) : 1 = 1 to 8} and choose one triple from each input 
subdomain, we may obtain the test set: 


tl = (1, 2, 3) 
t2 = (14, 6, 4) 
t3 = (1, 1, 1) 
t4 = (2, 2, 1) 
t5 = (3, 2, 2) 
t6 = (5, 4, 3) 
t7 = (6, 5, 4) 
t8 = (4, 3, 2). 


Such a test set will automatically satisfy (*) and the test selection 
criteria will more than likely meet Conditions 2 and 4 above. But we 
have no guarantee that the “structural” conditions 1, 3, 5 will be met, 
since we haven’t looked at the program! 
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Weyuker and Ostrand’* have made the cogent suggestion that the 
input domain be partitioned both on the basis of the specification- 
driven, program-independent properties mentioned above, and on the 
structural properties of the program as well. It seems that this is the 
only way to meet all five of the Goodenough and Gerhart conditions, 
and to thus have a chance of approaching a reliable test selection 
criterion C = {c} defined by a set of test predicates. 

Suppose we add a sixth (implicit) condition to the five that are 
outlined above, namely: 


Condition 6: The test predicates must be “complete” in that every 
input of the domain D must satisfy (exactly—see condition 5) one of 
the test predicates. 


Then Conditions 5 and 6 ensure that C = {c} defines a partition 
x = {C} 


on the input domain. When we concentrate only on the problem 
specifications, as above, we obtain the problem partition consisting of 
problem domains C. Having a completed version of the program in 
hand, we may speak as well of a path partition 


w= {P} 


of the same domain D, where each path domain P comprises a class of 
inputs that traverse the same path through the program. Thus, the 
path partition separates D into classes of inputs that are treated the 
same way by the program, whereas the problem partition separates D 
into classes that should be treated the same. 

There is no assurance that these two partitions will coincide, nor is 
it necessary that they do. But ultimately (or at least, hopefully), the 
program and its algorithm all derive from the original problem speci- 
fication, so we should not expect the two paritions to differ markedly. 
On the other hand, those differences that do exist are fruitful places 
to look for errors! Recognizing this, Weyuker and Ostrand have 
suggested that the problem and path partitions be intersected, yielding 
a finer partition 


g=xkAnr={CNP}= {8S} 


of nonoverlapping subdomains S of the domain D, and they further 
suggest that this be used as the ultimate test selection criterion, 
choosing one test case from each subdomain as before. 

In the terminology of Weyuker and Ostrand,’®!! a subdomain S of 
D is said to be revealing (of errors) if 


success(s in S) = correct(F, S), 


i.e., if the successful execution of any input from S implies correctness 
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of the program over the whole subdomain. Since the inputs of a 
subdomain S (in the intersection above) should be and in fact are 
treated the same by the program, the hope is extended that these are, 
in essence, the revealing subdomains. A successful execution of one 
test datum from each of the subdomains S then implies (or at least 
suggests) the correctness of the program over the whole domain. 
Consider the “triangle classification problem” once again, and sup- 
pose we are presented with the program (flowchart) of Fig. 1 as 






(a=b) 4 (b=c) 





“ILLEGAL” 





“EQUILATERAL” “ISOSCELES” 











“OBTUSE” 


Fig. 1—Flowchart for classifying triangles. 
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representing a solution to the problem. There are six paths through 
the program, as described by the conjunctions of branch conditions 
defined by each path, as follows: 


pl: ~[(a 2 b) and (b 2 c)] = (a<b) or (b<c) 

p2:(a>b>c) and (axa = b«xb + cxc) 

p3:(a>b>c) and (axa < bb + cxc) 

p4:(a>b>c) and (axa > bxb + cxc) 

pd: (a 2 b2c) and [(a = b) or (b = c)] and ~[(a = b) and (b = c)] 
=(a=b>c)or(a>b=c) 

p6:a=b=c. 


Intersecting the six corresponding path domains P[i] (1 = 1 to 6) with 
the eight earlier problem domains C[i] results in a partition {S} of 
nine subdomains characterized as follows: 


S1=C1=C1N PI: (a<b) or(b<c) 
S2=C2NP4: (a>b>c) and(aZzb+c) 
S3=C2NP5: (b=c) and(aZb+c) 
S4=C3=C39 P6:a=b=c 
8S§6=C4=C4NP5:a=b>c 
S6=C5=C5MP5: (a>b=c) and (a<b+c) 
S7=C6=C6N P2: (a>b>c) and (axa = bed + cxc) 
S8=C7T=C7N P3: (a>b>c) and (ata < beb + cxc) 
S9= C8 =C8N P4: (a>b>c) and (ata > deb + cxc) 
and (a<b+c) 


in very close agreement with the problem partition {C} obtained 
earlier. 

The problem we are discussing has a rather precise functional 
specification so that we would expect that the problem and path 
partitions might nearly coincide. Nevertheless, there is a slight dis- 
crepancy, and in place of the test datum ¢2 = (14, 6, 4) we would now 
have to choose two, say (14, 6, 4) and (38, 1, 1). A test of the resulting 
nine data points would then reveal two errors, as shown in Table I 
below. 


Table |—Test of nine data points 


Domain Test Data Correct Output Actual Output 
S1 (1, 2, 3) Illegal illegal 
S2 (14, 6, 4) Not a triangle obtuse 
S3 (3, 1, 1) Not a triangle isosceles 
S4 ay Equilateral equilateral 
S5 (1, 1, 1) Tsosceles isosceles 
S6 (2, 2, 1) Isosceles isosceles 
S7 (3, 2, 2) Right right 
S8 (5, 4, 3) Acute acute 
S9 (6,5, 4) Obtuse obtuse 

(4, 3, 2) 
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The programmer has failed to take account of those situations where 
a=b+c (not a triangle). And our test criteria are able to detect such 
errors. In fact, so detailed is the specification for this example that a 
test set based on the problem partition alone would have served equally 
well. 

In spite of the obvious relevance of the ideas presented here, partic- 
ularly those of Weyuker and Ostrand, a good deal of work remains to 
be done to apply the theory to a wide class of programs. One of the 
more important tasks is to find more systematic methods for con- 
structing the problem partition. This will not be easy, since finding a 
good problem partition is quite similar to the task of creating the 
program itself. It is suggested, however, that the development of formal 
specification languages would be helpful here, particularly if such 
developments are made with a specification-driven testing methodol- 
ogy in mind, along the lines presented here. 

An equally important consideration when thinking of applying the 
above theory to larger programs is that of obtaining the domains of 
the path partition. How are the paths to be described, generated, and 
selected with programs of increasing size and complexity? It is cer- 
tainly clear that our one example is misleading in this respect. We 
had only a small number of paths to consider, whereas a typical 
program of any size will have a very large number of paths, most likely 
an infinity of paths, owing to the presence of loops. If our test is still 
to be finite, how do we then choose paths judiciously? How are they 
described? And most importantly, how do we generate test cases that 
will traverse these paths, if indeed this is possible? These are some of 
the questions that we begin to address in the following sections. 


Ill. GENERAL THEORY—THE STRUCTURAL APPROACH 

In a structural approach to the theory of testing, a program F is 
represented by a “skeleton” of its underlying flowchart, a directed 
graph symbolizing the flow of control. This point of view is advanced 
most effectively in the extremely lucid survey paper by Huang.’ We 
should keep in mind, however, that the flowchart graph must be an 
accurate representation of the program flow in the code itself. Ordi- 
narily, this is ensured through the use of a “tool” that automatically 
generates the flowgraph from the source program listing. 

Using Huang’s terminology,'*"* a program block is a maximal se- 
quence of program statements having the property that if the first 
member of the sequence is executed, then all other statements in the 
sequence will also be executed. A program graph F = (V, E) is then a 
directed graph with vertex set V and edge set E, where each vertex is 
associated with a program block and in which there are pairs of edges: 


(i, j) labeled by the condition C 
(it, k) labeled by the condition ~C 
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according as the flowchart segment: 
T F 


encountered for blocks Bi, Bj, and Bk. (For convenience, we permit an 
empty block as a vertex in good standing, e.g., for treating an “if --- 
then ---” statement with vacuous “else” clause.) It is further assumed 
that the graph has a single entry point, the start vertex, and a single 
exit point, the stop vertex, and that every vertex lies on some path 
from ‘start’ to ‘stop.’ 

A path in a (program) graph is defined in the usual way, as a 
sequence of edges 


p =el, e2, --- , en, 


though we ordinarily assume as well that we begin the sequence at 
‘start’ and end at ‘stop.’ Each such path has an associated path 
predicate 


P=P1A.---APn 


written as a conjunction of the individual interpreted branch condition 
labels on the edges e, as discussed below (see Section V). The path 
predicates P are to be identified in one-to-one correspondence with 
the path domains of the previous section. Thus we may write (some- 
what ambiguously): 


D=UP for P= {din D:P(d} 


so that the program function F: D — R is a union of functions F(P) : 
P — R restricting F to the individual path domains P. 

In structured testing, we examine the program (as a digraph) and 
we seek to choose a finite set of paths that will cover the program with 
a certain degree of thoroughness. It is then hoped that test data 
causing the program to be successfully executed when traversing these 
paths are sufficient to warrant our confidence in the program’s cor- 
rectness. The theoretical underpinnings of such a testing plan have 
been studied by Howden,’*"8 who relates his work to the earlier study 
by Goodenough and Gerhart.’ Rather than speaking of a “test criteria,” 
however, Howden refers to a testing strategy, as a uniform computable 
function 


(F: D — R) 44 (T, subset of D) 
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that associates with each program F a finite test set T’ of D. H is said 
to be an ideal strategy if each T = H(F) is an ideal test (for every 
program F’). As is so often the case with testing theory, the first result 
is of a negative character: 


Theorem (Howden): An ideal testing strategy does not exist. 


Nevertheless, Howden was able to show that “path testing” can be 
a reliable approach, at least for detecting certain types of errors. He 
takes the view that the program being treated is a member of a class 
of programs differing only as to whether they are correct, and for 
which the incorrect programs have errors of various (known) types. 
His objective was then to find, if possible, a restricted set of programs 
for which certain forms of structured testing (i.e., path testing) would 
be reliable. Typical of Howden’s results is that which assumes that 
the error in a program does not change its control flow, i.e., that the 
set of path domains is not affected. 
Theorem (Howden): Path testing is a reliable method for distinguishing 
correct from incorrect programs, as long as the errors of incorrect 
programs do not affect the path partition. 


Of course, there are theoretical limitations in applying results such 
as this, since Howden has in mind our choosing one test datum from 
each path domain P, and these may be infinite in number. On the 
other hand, he has also devised a classification of error types that can 
be expected to lead to new insights into the testing problem generally. 
The reader should consult Howden’s work (particularly Refs. 16 and 
17) for further detail. 

In a practical test setting, we require that the subset T of D be 
finite. Moreover, if we are speaking of a “path testing strategy,” the 
above schema will be decomposed into the three-stage process, 


(F:D ---R) — — — — -(T of D) 
program graph test case 
construction generation 

F =(V, £) ——————— {P} 

test path 
selection 


summarized as follows: 

1. Program graph construction 

2. Test path selection 

3. Test case generation. 
The first phase of the process, to construct the program graph from a 
source code listing, is fairly straightforward, and for most of the 
conventional programming languages, e.g.. FORTRAN, Pascal, etc., 
such implementations are already in existence. 
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As a matter of fact, implementations of testing tools are in various 
stages of development for treating the entire process outlined above 
(e.g., see Clarke'’). But, as we shall see, there are serious problems 
associated with the latter stages of any proposed implementation along 
these lines. There are many alternative strategies to choose from, and 
seemingly, none of these is best for all situations. All we can do at this 
point is to outline the several alternatives and comment on their 
general suitability. We begin by introducing the various path selection 
criteria, continuing this discussion in the next section. The last, and 
perhaps the most difficult, of our three subprocesses, the generation 
of test data, is treated in Section V. 

There are, as we have indicated, a number of path selection criteria 
that can be used in attempting to devise a testing strategy that will 
provide a reasonable coverage of a program graph. Among these criteria 
are: 

1. Statement coverage: Execute all statements (blocks) in the graph. 

2. Node coverage: Encounter all decision node entry points in the 
graph. 

3. Branch coverage: Encounter all exit branches of each decision 
node in the graph. 

4. Multiple condition coverage: Achieve all possible combinations of 
condition outcomes at each decision node of the graph. 

5. Path coverage: Traverse all paths in the graph. 

These five strategies are related in their strength of coverage as shown 
below: 


path coverage multiple condition coverage 
branch coverage 


node coverage statement coverage 


with the weaker criteria at the bottom and the stronger criteria at the 
top. 

As an example illustrating the differing requirements of these cri- 
teria, consider the flowchart segment (program graph) shown in Fig. 
2. In order to achieve node coverage, the single test: 


abe: A=2,B=1,X=1 


will suffice (but it will not achieve statement coverage because the 
assignment X <— X/A will not have been executed). On the other hand, 
the single test: 


ace: A=2,B=0,X=83 
will be sufficient for complete statement coverage (and node coverage 
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Fig. 2—Flowchart segment illustrating coverage criteria. 


as well). For branch coverage, however, at least two tests would be 
required, e.g., 


acd: A=3,B=0,X=8 
abe: A—2,B=1,X =1. 


In the multiple condition coverage criterion, there are 2*2 + 2+*2 or 8 
outcomes to achieve in combination for the two simple conditions at 
each decision node. These may be satisfied, for example, by the 
selection of four separate tests, e.g., 


ace: A=2,B=0,xX =4 
abe: A=2,B=1,X=1 
abe: A=1,B=0,X =2 
abd: A=1,B=1,X=1 


The first test satisfies the conditions A > 1, B = 0 in the first decision 
and A = 2, X > 1 in the second decision. The second test ensures that 
A>41,B # 0 for the first decision and A = 2, X S31 for the second 
decision. Further analysis shows that all eight combinations are 
achieved. For the path coverage criterion to be met, we again require 
four tests, e.g., 


ace: A=2,B=0,X=4 
acd: A=3,B=0,X=83 
abe: A=1,B=0,X =2 
abd: A=1,B=1,X=1 
Note, however, that this test set would not satisfy the multiple con- 


dition coverage criterion. 


PROGRAM TESTING 3085 


It is clear that “statement coverage” and “node coverage” are in 
themselves rather weak strategies for testing, representing necessary 
but by no means sufficient criteria for a reasonable structural test. 
The “branch coverage” criterion, however, implies the other two (as 
seen in the diagram above) and has come to be regarded as a minimal 
standard of achievement in structure-based testing. The stronger 
criteria of “multiple condition coverage” and “path coverage” are 
difficult to achieve in a program of any complexity. In fact, the path 
testing criterion is usually relaxed to the extent that only “equivalence 
classes” of paths are represented. In a program of any size, particularly 
in the presence of program loops, there is a virtual infinity of paths 
through the program graph. Two paths are then considered “equiva- 
lent” if they differ only in their number of loop traversals. One then 
chooses only one representative from each such equivalence class in 
devising a test set. But still, this modified path coverage criterion is 
difficult to achieve in practice. 

A survey of the literature shows that there is little common agree- 
ment as to what would be considered as an ‘adequate’ structural test 
criterion. As we have noted, the “branch coverage” criterion has been 
widely recognized as a basic measure of testing thoroughness. This is 
evidenced by the fact that most of the major software testing tools in 
existence or in development do indeed include some provision for 
achieving this particular test goal. The disagreement seems to be in 
deciding how much more (or less) is needed beyond this basic require- 
ment to entitle a structural testing strategy to be considered adequate. 

If total branch coverage is indeed used as a measure of testing 
thoroughness, a simple calibration scheme can be invoked, using a set 
of software counters. One “prepares” the program for testing by in- 
serting counters at appropriate points in a modified copy of the 
program, and after running through the test set, one can determine 
the degree of thoroughness from a listing of the resulting counter 
values. This is the method of test instrumentation. We first define a 
decision to decision (DD) path of a program to be a sequence of a 
statements leading from a decision box (or the “start” node) to a 
decision box (or the “stop” node), having no intervening decisions. To 
determine whether every branch of our program has been encountered 
at least once (branch coverage) in our testing, it is sufficient to insert 
a counter at the ‘head’ of each DD path. 

Consider the classical flowchart solution (Fig. 3) to the problem of 
computing z = “x to the power y”. Here, there are five DD paths: 


abc, d, efhi, gfhi, jk 
and we therefore insert our software counters at the points a, d, e, g, 
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Fig. 3—Flowchart for computing x’. 


j, as shown. If we have run two test cases as shown in the table below, 


S 
jo) 
rere & 
rPoOoRQ, 
Re OX 
oO 
PHN 


we would find that ‘counter g’ has not yet been activated. Inspection 
of the flowchart then shows that we need a test to traverse the path 
a, b, c, d, g, etc., requiring y # 0, y mod 2 ¥ 1. So we may use 
x = 24 and y = 2, say, as an additional test case, thus ensuring complete 
branch coverage. Ideally, this latter phase, directing the tester to the 
area of untested code, would also be automated. 

Of course, we would like to automate as much of the testing meth- 
odology as possible, recalling the three-stage process mentioned earlier. 
On the other hand, in lieu of a complete mechanization, the testing 
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instrumentation scheme presented here can be of great help in isolat- 
ing areas in need of further testing. Furthermore, it can be argued that 
for the little extra cost entailed, it is a worthwhile investment in any 
testing process, fully automated or not. [Parenthetically, we might 
note as an indication of the expense associated with the development 
of testing tools generally, that a package that does little more than 
“test instrumentation,” as described here, has been announced recently 
(Computer, May 1982) by Management and Computer Services, selling 
for $12,000.00!.] It may be that some other criterion than “branch 
coverage” is being used as a measure of test thoroughness. It is still 
good practice to be concerned as to what extent this standard measure 
is being met. Moreover, it is reasonable to suppose that the “instru- 
mentation concept,” as exemplified here, might generalize to settings 
where other thoroughness criteria are being used. 


IV. TEST PATH SELECTION 


As we have indicated, there are a number of criteria that can be 
used in selecting program paths to achieve an adequate testing cover- 
age. But the question then becomes: How do we automatically generate 
a collection of paths meeting a given criterion? The literature is 
somewhat “hazy” on this point. Perhaps the most explicit treatment 
of the problem is that of Paige,””? in reference to programs built up 
from a strict adherence to the structured programming methodology. 
In fact, we know of no more general approach to the problem, one 
that would handle structured or unstructured code in relation to the 
whole spectrum of path selection criteria. 

Recall that a structured program F is one that has been built up 
inductively from certain “simple statements” as a base (typically, the 
assignment, input and output statements, and procedure calls), using 
only the three familiar constructs: 

1. Sequence: begin P1; P2; --- ; Pn end 

2. Selection: if C then P else Q 

3. Repetition: while C do P 
for structured (but possibly themselves compound) statements Pi, P, 
Q, respectively. The resulting program graph F = (V, E) is then of a 
correspondingly restricted form, greatly facilitating the path analysis 
problem. Perhaps collapsing sequences of simple statements to a single 
block (graph node), one may then use a “regular expression” r(F’) to 
characterize the program flow, associating 

1. The ’-’ operator to sequences 

2. The ’+’ operator to selections 

3. The (Kleene) ’*’ operator to repetitions, 
respectively. 

For example, if F is the (structured) program graph shown in Fig. 
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4, we have 
r(F) = a(b(d + e)(k + 1) + c(f + g(h(i + j))*m)) 


as the corresponding regular expression. Note the loop (h(i + j))* 
resulting from a “while” statement. 

We have mentioned earlier, in reference to the modified path cov- 
erage criterion, how an equivalence relation is often used to obtain a 
finite representation of the path alternatives in the presence of loops. 
Accordingly, if we make the substitution x* = x + 1 (1 = null) in the 
regular expression r(F’), we acknowledge that a loop is either executed 
or not. Multiplying out so as to obtain a “sum of products” expression, 
one then obtains the desired collection of paths satisfying the modified 
path coverage criterion, e.g., 


abdk acf 
abdl acghim 
abek acghjm 
abel acgm 


in reference to the program graph of Fig. 4. On the other hand, it does 
not appear that this technique can be extended to handle unstructured 
programs. 

But if we continue to deal with a structured program graph, we can 
describe a method for deriving a minimum number of paths sufficient 
to meet the “branch coverage” criterion. We assign a set of paths S(r) 
to each regular expression r = r(F) inductively, as follows. We let 
S(a) = {a} for each simple statement a, and then, assuming that S(r) 





Fig. 4—Structured program graph illustrating modified path coverage criterion. 
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and S(t) have been defined, for regular expressions r and ¢, we set: 
1. S(r-t) = S(r)-S( 
2. S(r+ t) = S(r) U S(t) 
3. S(r*) = S(r)*. 
Here, S(r)* is the singleton set obtained by concatenating (in any 
order) all of the paths in S(r), and similarly, the set product S(r)- S(t) 
is obtained by concatenating paths in S(r) with those in S(t)—but 
retaining only enough products so that each of the factors from S(r) 
and S(t) are represented. It follows that 
1. |S(r-t)| = max{|S(r)], | S(2) |} 
2. [Sir + t)] = |S(r)| + |S@)| 
3. | S(r*)] = 1. 
By way of illustration, in considering once again the example from 
Fig. 4, we may compute: 
Si + j) = fi, 7} 
S(h(i + j)) = {hi, hj} 
S((A(i + j))*) = {hihj} 
S(g(h(i + j))*m) = {ghihjm}, 


etc., and finally, 
S(r) = {abdk, abel, acf, acghihjm}, 


yielding four paths that together cover all of the branches of the 
program. . 

Once again, as in the case of the previous algorithm, there seems to 
be no easy extension of this technique that would handle unstructured 
programs as well. However, a general upper bound is readily available 
regarding the number of paths necessary for total branch coverage. 
Whether our program is structured or not, we make the observation 
that if a test path reaches a particular node of the program graph, 
then it must exit this node through one of the (two) branches leaving 
the node. If the graph F has e edges and n nodes, it follows that 


v(F) =e-—(n-2)=e-—-n+2 


is an upper bound on the number of paths necessary to achieve total 
branch coverage. Coincidentally, this is the formula for McCabe’s 
cyclomatic complexity measure,” a figure that has proved to be useful 
in estimating overall “program complexity”. At the same time, the 
graph theoretic derivation of a program’s “independent” circuits 
(paths) yields a branch covering of paths, v(F) in number—though 
generally somewhat in excess of the minimum number of paths that 
would be required. In the context of our running example, we have 
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v(F) = 138 — 8 + 2 =7 and a corresponding set of basis paths: 


acf acgm 
abdk_ acghim 
abek acghjm. 
abel 


Note that the single path abdl from our “path coverage” list that is 
not present here is itself a linear combination of paths already listed. 
Note as well that we obtain seven paths here, whereas we know from 
the preceding analysis that four paths will suffice (for branch cover- 
age). 

The whole notion that McCabe’s basis of program paths should 
constitute a goal of program testing has attracted considerable atten- 
tion, and we feel obliged to comment on this point. Perhaps this is 
best done by listing what we think are the pros and cons to the 
approach. On the positive side, we cite the following: 

1. The method is sufficiently general as to be applicable to both 
structured and unstructured programs. 

2. The resulting “basis” does indeed ensure total branch coverage. 

3. The paths of a basis are feasibly computable, using standard 
graph theoretic techniques. 

On the other hand, these aspects must be counterbalanced with the 
following: 

1. A single basis is not uniquely determined—there are many, and 
we must make a choice. 

2. The number v(F) of paths in a basis can greatly exceed the 
minimum number of paths required to achieve branch coverage. 

3. The notion that, in some sense, every path in the program graph 
is accounted for by our having selected a basis is somewhat specious. 
Note that we do not comment here on the inadequacies of McCabe’s 
v(F) as a measure of overall program complexity—we leave this 
discussion to a separate.paper. On the other hand, the arguments for 
and against the use of the associated “basis of program paths” as a 
testing. strategy are inconclusive at best, particularly in comparison 
with the “level paths” of Paige”)*” that we now describe. ) 

In a program graph F = (V, E), a level-0 path is a simple (acyclic) 
path from “start” to “stop”. In effect, these paths trace the “fall 
through” conditions in the program. Then, inductively, a level-i path 
(t > 0) is a simple path (perhaps a circuit) that begins and ends on 
nodes of a path of lower level, but has none of its other nodes previously 
appearing on paths of a lower level. Intuitively, the level-i paths for 
it > 0 account for program loops of increasing nesting level and for 
feedback paths, etc., in the case of an unstructured program. 

Considering our earlier structured program graph (Fig. 4) and the 
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unstructured program graph of Fig. 5, we tabulate the respective level- 
i paths as shown in Tables II and III. 

In any case, we are able to say that a given level-(i + 1) path is 
“associated with” a certain level-i path according as the given path 
begins and ends on nodes of the parent path. This relationship orders 
the level paths in a tree-like structure, in such a way that one can 
readily construct test paths that again effect a total branch coverage. 
In so doing, only level paths that associate can be combined to form a 
program test path. Thus, for example, in the case of the program graph 
of Fig. 4 above, we may construct the path acghihjm as the linear 
combination: 


acghihjm = (6) + (7) + (8) 


using the notation in Table II. 
It is clear that the level paths of a program graph span the space of 





Fig. 5—Unstructured program graph. 


Table I|—Level-i paths for structured program graph 
(see Fig. 4) 
Level Level Paths V[z] E{i] 
“1 {1, 8} 0 


(1) abdk 
(2) abdl 
0 (3) abek {2, 3, 4, 5, 6} {a, b, Cc, d, é, 4; 8, k, i; m} 
(4) abel 
(5) acf 
(6) acgm 
1 (7) hi {7} {h, i, j} 
(8) hj 


Note: The sets V[i] and E[i] of vertices and edges at level-i are useful in the 
computation of the level-(i + 1) paths. 
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Table III—Level-i paths for unstructured program graph 
(see Fig. 5) 
Level Level Paths V{[il Ei] 
-1 {1, 7} 0 


(1) acgjl 
(2) acgkl 
(3) abjgyl 
0 (4) abfgkl {2, 3, 4, 5, 6} {a, b,c, f, g,h, j,k, U} 
(5) ach 
(6) abfh 


1 (8) e 0 {d, e, i} 


Note: The sets V[i] and E{i] of vertices and edges at level-i are useful in 
the computation of the level-(i + 1) paths. 


program paths. But taken together, they do not usually constitute a 
basis. Thus, again in Fig. 4 above, we have eight-level paths, whereas 
we know from our previous analysis that this graph has rank v = 7. 
On the other hand, Paige’s level paths have a definite uniqueness, an 
advantage over the notion of a basis as developed by McCabe, and 
leading to a graduated level path testing strategy as follows: 

1. First test all level-O paths—in effect, keeping all loops in the 
“nonexecuting” mode. 

2. Next test all level-1 paths, reaching them through their associated 
level-0 paths, etc. 

The result is a highly structured testing strategy where segments of 
the program are treated in successive layers of nesting depth. 

The level path testing strategy provides for a rather exhaustive 
treatment of a program’s path structure at successive depths of nesting. 
In this sense, the approach has a potential thoroughness rivaling that 
of the “modified path coverage” criterion. On the other hand, Paige’s 
strategy is readily applicable to both structured and unstructured 
programs. At the same time, his method lends itself to a convenient 
algorithmic solution,” though one must be prepared to compute all 
simple paths (or circuits) between various identified pairs of nodes, 
along edges not previously used—most likely requiring the use of a 
“depth first search” strategy. Except for this computational difficulty, 
the approach is quite orderly; it provides for a more thorough testing 
than simple branch coverage, and it compares favorably against 
McCabe’s “basis of program paths” in that: 

1. The level paths are uniquely determined. 

2. The number of level paths will exceed v(F). 

3. The notion that somehow every path in the program graph is 
accounted for by our successive treatment of its levels has a good deal 
more credibility. 
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In conclusion, it must be noted that all of the methods we have 
discussed for selecting program test paths are subject to one overriding 
criticism. There is absolutely no assurance that the paths selected will 
be feasible, i.e., executable with an appropriate choice of input data. 
We suggest that this problem becomes more serious (and is surely 
more difficult to analyze) in the case of paths selected in an attempt 
to minimize the number required for branch coverage. Paige’s “level 
path” strategy would seem to be easier to handle in this respect, since 
we build up paths from the simple to the more complex, starting with 
those that are more likely to be feasible. 


V. TEST CASE GENERATION 


The whole question of path feasibility is related to the “test case 
generation” problem. This is the one remaining phase to be discussed 
of the three that were outlined in the rectangular problem-decompo- 
sition paradigm of Section III. Considering the question of feasibility, 
however, we can see that it is difficult to so trichotomize the automa- 
tion of the overall testing program. Though useful as a paradigm, we 
must admit that this partition of the problem is overly idealistic in 
relation to the real world of program testing that we are likely to 
encounter. 

Suppose we have selected a set of program paths because they meet 
one or another of the test coverage criteria, or for whatever reason. 
There still remains the problem of generating corresponding test cases 
that will drive the program through the indicated paths. This again 
turns out to be a nontrivial (and in some cases, unsolvable) problem. 
All we can do at this point is to summarize the approaches that have 
been taken by researchers in the field and to give a few suggestions 
that might aid in developing a workable methodology. 

Perhaps the most comprehensive treatment of the problem is that 
of Clarke.’**”° Consider a single path p from “start” to “stop” through 
a program F = (V, E), again viewed as a directed graph. We intend to 
show how p may be characterized as a path predicate, i.e., a logical 
condition 


P=P1AP2A.--APn 
expressed as a conjunction of “interpreted” branch conditions derived 
from the labels on the edges of F. Inductively, we may think of p as 
having developed as a sequence of “partial paths:” 
p(k) = (v[0] - start, v[1], aera | v[k]) 


leading from “start” to some intermediate vertex v(k) on the way to 
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“stop”. Correspondingly, we may give an inductive derivation of the 
path predicate, writing P(0) = true and 


P(k) = P(k — 1) A ibp(v[k — 1], v[R]), 


where the latter conjunct is the interpreted branch predicate associated 
with the transition from vertex v[k — 1] to v[k]. More precisely, ibp(e) 
for an edge e labeled with the Boolean condition C will be computed 
by substituting (in C) the current “symbolic values” of all variables 
according to their updating along the partial path p(k). 

For example, consider the flowchart solution (Fig. 6) for estimating 
the point where a function f takes on its maximum value. For the path 





Fig. 6—Flowchart for estimating the point where a function is maximized. 
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p = abclmn, we compute 


P(0) = true 

P(1) = P(0) A true = true 

P(2) = P(1) A true = true 

P(3) = P(2) A (b-aSc) =(b-aSc) 
P(4) = P(8) A true = (b-—a Zc) 


and finally, 
P= P(5) = P(4) A true = (b-aSo), 


noting that it was necessary to substitute a — b for w in the condition 
for traversing edge 1 because of the earlier assignment statement. 

In general, this process of continually updating the symbolic values 
of program variables as we proceed along a path is called symbolic 
execution (or symbolic evaluation). The data descriptions generated in 
symbolic execution provide a precise representation of the changing 
program state. Initially, the program state is the three-place vector: 


state = [start, values (start), pathpred (start)] 
= (start, (L, 1, --- , 1), true), 


where “values” tabulates the symbolic values of all program variables 
(L = undefined), and “pathpred” stores the inductively generated path 
predicate P as described earlier. Symbolic names are assigned (in 
“values”) to input variables whenever a read statement is encountered 
on the program path. Throughout the symbolic evaluation, all symbolic 
representations of variable and branch predicate values are then 
expressed in terms of these symbolic names, as representatives of 
input values. In particular, as one encounters an assignment statement 
(v — e), the symbolic value of the program variable v is updated (as 
in the example above) through substitution of the symbolic value of 
the expression e. In this way, “state” and especially the “values” vector 
will provide a continually updated snapshot of the program’s devel- 
opment along the path. Moreover, the final value of the “pathpred” 
component of “state” provides the logical conjunction described above. 

This path predicate P defines a corresponding (path) subdomain of 
the input space D, and by the nature of the symbolic evaluation 
technique, P is expressed as a set of conditions on the input variables 
alone. To generate a test case (of input data) that will cause the 
program to traverse the path p, it is then only necessary to find input 
values that satisfy all of these conditions. As we might expect, however, 
this is often easier said than done. 


3096 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1983 


Before discussing this problem in any detail, it is better that we first 
describe an alternative to the above approach, one that proceeds in 
reverse—from the end of the path to its beginning. This technique, 
known appropriately as backward substitution, is best described in the 
survey paper by Huang.’” To traverse a path p, certain conditions 
must be met, i.e., the set of branch conditions (C or ~C) along the 
path must be satisfied as they are encountered. On the other hand, 
suppose that an assignment statement (v < e) intervenes, between 
“start” and the predicate Q, the latter representing a given branch 
condition (though modified by “partial backward substitution” as we 
are now describing). In the following flowchart segment: 


hi 


v<—e 


° Q 


if we want Q to be true after the assignment (v < e) has been executed, 
then the predicate Q(v < e) must be satisfied prior to its execution. 
Here, Q(v < e) is the predicate obtained by substituting the expression 
e for each occurrence of v in Q [and we speak of Q(v < e) as the 
predicate obtained by dragging Q backward through the indicated 
assignment statement]. It follows that the conjunction 


R A Qiu —e) 


is necessary for our passage along the edge with condition R (through 
the assignment) and then to satisfy Q. 

Altogether, if we want the specific path p to be traversed in a 
program’s execution, then we must drag each of its edge conditions 
backward to “start”, and the conjunction of all resulting predicates 
must be satisfied by the corresponding test case of input data. Note 
once again that we obtain in this way a corresponding path predicate: 


P=P1iAP2A.--- A Pn, 


i.e., a conjunction of modified branch conditions, each expressed in 
terms of the input variables to the program. 

Consider once again the example of Fig. 6, and suppose we wish to 
traverse the path abcdefghikclmn. The listing shown below traces the 
dragging of the three necessary branch conditions backward along this 
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uo) 
9 
ct 
Be 


l wc 

¢ wc 

k b-aSc 

tl b-pSec u<u 

h b-pSc u<u 

g€ b-psc u <f(q) 

f b-psc u< f(b — w/3) 

e b-psc f(p) < f(b -— w/3) 

d b-—(a+w/3)Sc f(a + w/3) < f(b — w/3) ~(w Sc) 
c b-—(a+w/3) Sc f(a + w/3) < f(b — w/3) ~(w Sc) 
b b-—a-—(b-—a)/3Sc f(at (b—a)/3) <f(b— (6 — a)/3)~(w Sc) 
a b-—a-—(b-—a)/3Sc f(at (b—a)/3) <f(b— (b — a)/3)~(w Sc) 


One finally obtains the conjunction of three predicates: 


P1l:b-—a-(b-a)/3 Sc 
P2: f(a + (b — a)/3) < f(b — (b — a)/3) 
P3:~(b-—asSc) 


all expressed in terms of the inputs a, b, c (and the “called” function 
f). For purposes of comparison, the reader may try to compute an 
equivalent predicate using the symbolic execution method. 

In an overall comparison of these two methods, one can identify an 
obvious “trade-off.” With backward substitution, we avoid the costly 
storage facility needed for the continuous updating of all the symbolic 
program variable values. On the other hand, an important advantage 
accrues to the symbolic execution method, one that is not available 
for the backward substitution technique. Namely, we are more easily 
able to determine whether a given path is (or will be) feasible. And we 
can make the determination early in the symbolic evaluation. We need 
only check that the inductively generated predicates P(k) are noncon- 
tradictory, as far as they go. We begin with P(0) = true—certainly 
there is no contradiction here. Then, in the expression for P(k) in 
terms of P(k — 1), we have only to see whether ibp(v[k — 1], v[R]) 
contradicts P(k — 1). If so, P(k) and hence P itself is contradictory, 
and the path p is infeasible. Otherwise, we keep going. Note, in 
comparison, that with the backward substitution method, we wouldn’t 
know whether a path was feasible until all of the calculation (of P) 
was completed—a definite disadvantage. 

One must note, however, that all such “logical satisfiability” prob- 
lems as we are now beginning to consider are exceedingly difficult to 
handle in practice. We include here the satisfiability question that 
results from the use of the “backward substitution” technique or the 
forward “symbolic evaluation” method, whichever is used. At the 
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conclusion of the backward substitution, we have a system of con- 
straints on the inputs to the problem, and unless these constraints 
can be “solved” for the input data, we don’t have a test case at all. 
The same may be said for the forward symbolic execution, except for 
the slight advantage that we can be determining the satisfiability (or 
lack thereof) as we go. 

Huang, in his survey paper,!” presents a systematic approach for 
handling the satisfiability problem, and we now outline the major 
features of his plan. The simplifying assumption is made that the path 
predicate takes the form: 


P=P1AP2QA.-- A Pn, 
where the Pi are nonnegated atomic expressions: 
dRe 


with d, e arithmetic expressions in the input variables and R one of 
the six relational operators: <, S, =, (), 2, >. Such a system of atomic 
logical expressions can readily be rewritten in the prenex normal form: 


(Ex1)(Ex2) --- (Exn)(x1 = e1) A (x2 = e2) A--- A (xn =en), 


where the E’s are “existential quantifiers” on auxiliary variable x’s, 
and the new expressions (the e’s) are differences of d, e above, sufficient 
to transform any inequalities to equalities. The inequalities are, in 
effect, shifted to the auxiliary variables, thereby serving to normalize 
the solution space. Thus, in place of the inequality 2(b — a)/3 Sc 
at the end of the table above, we would have (Ex1 2 0)[xl = c — 
2(b — a)/3]. Altogether, the three inequalities of that problem are 
similarly transformed, and we have instead the prenex normal form: 


(Ex1 2 0)(£x2 2 0)(Ex3 > 0) 


xl =c — 2(b — a)/3 
x2=b+2a-6 
x38 =b-—a-e, 
one that is somewhat easier to handle. 
From this point, standard techniques of linear algebra can be used 
to further transform the system into one where a minimum number 


of variables are involved. Thus, in the case of our running example, 
we can simplify the system so as to finally obtain: 


(Ex1 2 0)(Ex2 2 0)(Ex3 > 0) 
3x1 —x2 + 3x3 = 6 — 3a. 


From here, one may “guess” a solution, e.g., x1 = x2 = 0 and x3 = 0.1 
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say. One thereby obtains: 


a=1.9 
b= 22 
c = 0.2, 


an input set that will cause the program to traverse the path abcdef- 
ghikclmn in Fig. 6, as originally required. 

If we are going to have to “guess” a solution to the feasibility 
question in the end, however, the outright “trial and error” approach 
of Ramamoorthy et al.?° offers an attractive alternative. One makes 
the assumption, as before, that the path predicate P is expressed as a 
logical conjunction: 


P=P1AP2A.---A Pn, 


where each of the Pi is a constraint on the program’s input variables. 
Moreover, it is assumed that the input variables have been ordered as 
v[1], v[2], --- , v[m]. With each variable v[i], we associate a set S[t] 
of conjuncts from P, namely: 


S[t] = {Pj : only v[1], --- , v[t] occur in Pj} 


and these sets are then used as the basis for the “trial and error” 
algorithm shown in Fig. 7. 

Assuming that values have been found for v[1], ---, v[z — 1] 
satisfying all the conjuncts in S[1], ---, S[i — 1], we either solve for 
v[t] or randomly choose v[i], depending on whether the set S[i] 
contains an equality relation in v[i]. We then substitute this value in 
the conjuncts of S[i] . Should we thereby arrive at a contradiction, we 
“backtrack” to the iteration 1 — 1, generating a different value for 
v[i — 1]. Otherwise, we go ahead to the iteration i + 1. If the complete 
iteration on i concludes successfully, we arrive thereby at a “satisfia- 
ble” test case for the input variables of the program; otherwise we do 
not. Note that the “key” to the method is the fact that at each stage 
1, only the variable v[i] has not yet been resolved. Note, however, that 
the loop at the right of Fig. 7 must include some criterion for deciding 
that “enough” random numbers have been tried in the current itera- 
tion. But however this is decided, it must be conceded that such an 
approach as presented here has much to offer in its favor, particularly 
considering the difficulty of the “satisfiability” question in general. 

The authors” provide an example of the use of their algorithm on 
the “triangle classification problem” considered earlier. More gener- 
ally, they suggest that the method has proven to be successful in 
treating a much wider class of problems. We would note further that 
the method could conceivably be applied to the “running satisfiability” 
questions that arise in the use of the “symbolic execution” technique. 
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Fig. 7—Trial and error algorithm for solving the satisfiability problem. 






In fact, it seems that this “trial and error” approach has a definite 
place—at least as a method of last resort, to be used as a component 
of any overall testing methodology. 

Without some technique such as this, we are forced to rely on the 
extremely costly and not wholly reliable methods of “mathematical 
programming,” particularly those routines that are designed to gen- 
erate solutions to systems of inequalities. We cannot always assume 
that our systems are linear, in spite of the assumptions made by some 
authors. And in the absence of such an assumption, the problem is 
quite a difficult one, generally beyond the capability of the packages 
that are currently available. 
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Recognizing this, a most unusual and quite promising approach has 
been suggested by Kundu.’ The idea is to combine the “test path 
selection” and “test case generation” phases of the solution, using 
the previous test case(s) t[k] to help in determining the next test case 
t[k + 1]. The result is a sequence of determinations: 


(t[0]—) p[0]—#[1]p[1]— --- 


starting from an initial test case t[0], chosen at random. The method 
is as follows: 

1. Analyze t[k]: Execute the program with input t[k], and determine 
its execution path p[k]. Then perform a (partial) symbolic execution 
of p[k], so as to determine (an approximation to) its path predicate 
P[k]. 

2. Select next test case: Determine the next test case t[k + 1] so 
that it violates at least one constraint in each of the path predicates 
P{j], for] < k. 

We are thus assured that each new test care t[k + 1] causes the 
program to traverse a genuinely new path, different from all those 
previously chosen. 

In comparison with the previous methods we have discussed, Kundu 
reverses the roles of the test paths and the test data. The path p[k] is 
determined from ¢t[k] in order to guide the next test case t[k + 1] 
away from previous paths. That is, p[k] is not used for finding an 
input that corresponds to that path itself. Therein lies the novelty of 
the approach. 

Moreover, Kundu’s method is definitely not designed with any 
specific measure of test thoroughness in mind. (He asserts that no 
good measures of testedness are available, anyway.) It is clear, how- 
ever, that one could easily augment his procedure with test instrumen- 
tation devices, as discussed earlier, for the purpose of assuring that 
some standard test coverage criterion (e.g., branch coverage) has been 
met. 

The primary advantage of Kundu’s method is easily understood. 
Consider the constraint on t[k + 1] as described in (2) above, i.e., 


t{k + 1] not in P[1] U P[2] U--- U P[R]. 


It is clear that the “forbidden region” for ¢[k + 1] thus represents only 
a small portion of the total input domain D (see Fig. 8). This is so 
because the number of test cases generated in the testing activity is 
very small compared with the total number of executable paths in the 
program. Intuitively, the determination of the required t[k + 1] should 
thus be relatively easy. And for the same reason, the determination of 
a test datum in a given path domain (as is required in the usual 
strategy) should be more difficult. Kundu (see Ref. 27, pp. 176-177) 
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Fig. 8—Illustration of the forbidden region for selecting test cases. 


gives a more detailed account of this reasoning, and the thrust of his 
argument is quite compelling. The reader may wish to consult Kundu’s 
article for these additional details. 


VI. CONCLUDING REMARKS 


We have attempted to describe the many interesting and varied 
approaches to the program testing problem. Whereas no single ap- 
proach to the problem may hold all the answers, it seems that there 
are enough good ideas around as to suggest the feasibility of a workable 
methodology, based on one or another combination of the strategies 
that have been advanced to date. 

It must be remembered, however, that the thrust of our presentation, 
and, indeed, the main thrust in the literature has been toward the 
“unit test” level, where smaller programs are encountered. Thus, the 
ideas we have presented are, at the present time, feasible only in the 
case of programs of limited size. To think that we are nearing the 
point where we are ready to apply all of these techniques to the testing 
of an entire operating system or a compiler would be to miss the point 
completely. Nevertheless, our study has shown that indeed a start has 
been made. 

We have tried to present a reasonably balanced survey of the recent 
contributions to the research literature on software testing method- 
ology. It is perhaps likely that one or more worthwhile studies have 
escaped the author’s attention, and therefore, their omission from this 
survey should not reflect on their importance to the development of 
the field. Moreover, the author can only hope that the studies that 
have been cited here have been presented in their best light. Limita- 
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tions of time and space have prevented a more complete treatment of 
these works, and for this, apologies to the authors are in order. At the 
same time, this author would like to acknowledge the use of the many 
cogent examples from the literature cited, hoping as well that these 
and other contributions have been faithfully reported. 

In conclusion, the author would like to thank W. H. Leung, K. A. 
Gluck, and N. H. Petschenik for their most helpful comments in 
reviewing an earlier draft of the manuscript. 
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Parallel Fault Simulation Using Distributed 
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This paper presents a method of performing fault simulation of digital logic 
circuits using a special-purpose computer with distributed processing. The 
architecture for true value simulation presented in an earlier paper can also 
be used for parallel fault simulation. The special-purpose computer consists 
of inexpensive microprocessors interconnected by either a time-shared parallel 
bus or a cross-point matrix. The cross-point matrix provides higher perform- 
ance than the time-shared parallel bus. The performance of the proposed 
simulator is better by over two orders of magnitude than traditional logic fault 
simulation performed on a general-purpose computer. The power of the 
simulator is proportional to the number of microprocessors over a certain 
range. 


I. INTRODUCTION 


Fault simulation is an important part of the logic circuit design 
process. It is a means of determining the behavior of a logic circuit in 
the presence of each one of a predefined set of faults. 

One of the most common uses of fault simulation is in determining 
the set of faults detected by a proposed test sequence, i1.e., its fault 
coverage. Adequate fault coverage (usually greater than 90 percent of 
single stuck faults) is necessary to guarantee that the test sequence 
will detect most of the manufacturing defects. 


* This paper is based upon material to be submitted by S. H. Patel in partial 
fulfillment of the requirements for the Ph.D. in Electrical Engineering at the 
Illinois Institute of Technology. * Bell Laboratories. 
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In test pattern generation, fault simulation is used to determine the 
faults that are detected by the tests already generated so they can be 
removed from consideration. A yet undetected fault is then selected 
as a target for the next test. Additional faults detected by a newly 
generated test may be determined by simulation. Thus, fault simula- 
tion is used frequently as a part of the test generation process. 

Fault simulation is also used to construct fault dictionaries for fault 
isolation. Other uses of fault simulation include the evaluation of test 
point effectiveness and the evaluation of self-checking circuitry. Ef- 
fective test points are essential for factory testing. It is much cheaper 
and easier to locate and repair failures during manufacture than it is 
in the field. Also, good self-checking circuitry makes it easier to isolate 
faults in the field. 

Currently, fault simulation is carried out on large general-purpose 
computers. This method has seen some use in large-scale integrated 
(LSI)* designs, but suffers from excessive run time at current levels 
of integration. Its applicability to very large-scale integration (VLSI) 
is doubtful, at least in the manner that it is currently used.’ There is 
a need for more sophisticated and cost-effective fault simulators as 
very large simulation time and costs will result when dealing with 
circuits of VLSI complexity (more than 100,000 gates on a single chip). 


Il. PARALLEL FAULT SIMULATION 


A number of different algorithms have been developed for perform- 
ing fault simulation efficiently on general-purpose computers. Among 
these the best known and widely used are the parallel,” deductive,* 
and concurrent* methods. All these methods attempt to simulate the 
effects of a number of individual faults simultaneously. This paper 
will consider the use of the parallel fault simulation algorithm in the 
special-purpose simulation hardware architecture developed in Ref. 5. 

In parallel fault simulation the fault-free circuit and several different 
faulty circuits are processed simultaneously. The number of faulty 
circuits simulated in parallel is normally constrained by the number 
of bits in the host computer word. One bit of the computer word 
represents the signal value on a line in the fault-free circuit, while the 
remaining bits represent values on the same line in the presence of 
different single faults. Word-oriented operations performed on the 
host computer imply that the fault-free and faulty circuits are handled 
simultaneously and in exactly the same manner. 

The output fault word of a gate is computed by simple word-oriented 
logic operations on the input fault words. The logic operations per- 
formed on the fault words correspond to the logic operation performed 


* Acronyms and abbreviations are defined in the Glossary at the back of this article. 
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by the gate. Faults are injected using predefined masks. A stuck at 1 
fault on a lead is injected by ORing the fault word with a mask 
containing a 1 in the bit position for the faulty value and 0’s elsewhere. 
Similarly, a stuck at 0 fault can be injected by ANDing the fault word 
with a mask containing a 0 in the bit position for the faulty value and 
1’s elsewhere. 

Two logic values are not sufficient for accurate logic simulation. 
Since each bit position in the computer word can represent only a 
logical 0 or 1, more than one bit per signal is required for multiple- 
value simulation.® In this case more than one word is required. For 
three-value representation two bits are required to represent each 
signal and, therefore, two computer words are required for representing 
a set of fault-free and faulty values. Since a pair of words are used to 
represent a signal, some sort of coding method is required to implement 
parallel simulation. A commonly used method of coding denotes one 
of the words as the 0-word and the other word as a 1-word. Let iy 
represent the ith bit in the 0-word and i, represent the ith bit in the 
1-word. Then the igi; = 01 combination represents a logical 1, the 
igi; = 10 combination represents a logical 0, the igi; = 00 combination 
represents an unknown, and the iot; = 11 combination is unused. 
Simple word-oriented operations are still sufficient for performing the 
parallel simulation. This coding is the same as that in Ref. 7. Faults 
are injected in the 0-word and the 1-word using a 0-mask and a 1- 
mask, respectively. This injection is also done using simple logic 
operations. The method for simulating three logic values can be 
extended to any number of logic values by coding them using a 
sufficient number of bits.® Only three-valued simulation is considered 
in this paper. 

The most widely used method of parallel simulation is the event- 
driven method. Event-driven simulation means that an element is not 
simulated unless there is a change in one of its input fault words. The 
main operations performed in an event-driven simulation are process- 
ing of active elements and scheduling changes to occur in future. The 
scheduling is done on a timing wheel. A timing wheel is a list structure 
in which events are chained together in the order they are to occur. In 
parallel fault simulation an element is considered to be active if the 
fault word associated with its output changes. The fault word is 
considered changed even if only one of its bits changes. All the values 
(i.e., the whole fault word) are propagated even if only one of the 
values changes. 

Since the number of faults simulated at one time is restricted by 
the length of the host computer word, multiple passes through the 
simulator are necessary to simulate a large number of faults. It is 
possible to reduce the number of passes by using extra computer words 
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and simulating more faults during one pass. For example, 64 computer 
words can be used to simulate (two-value simulation) 1024 faults on a 
computer with a word size of 16 bits. The string of values in the set of 
computer words representing the fault-free and faulty signal values on 
a line will be called the value vector. The configuration of the value 
vector is shown in Fig. 1. The value vector consists of L, word pairs 
for three-value simulation. Two bits at the same position in the two 
words of a pair represent the signal value of one faulty circuit. 
Simulating L, word pairs at a time is better than making L, passes 
through the simulator since the overhead involved in the fanout search 
associated with each pass is saved. (However, as we will see in Section 
6.1, some of these savings are lost due to increased activity as the 
number of faults per pass increases.) The number of words used in the 
value vector is usually constrained by the space requirements of the 
computer. 

During simulation, operations are performed on value vectors by 
considering word pairs. Thus, the time required to perform an opera- 
tion on the value vector is proportional to the number of word pairs 
used in the value vector. For example, if there are several word pairs 
in a value vector, then faults are injected one word pair at a time. 

The whole value vector is considered active if any of the values in 
the vector changes. Furthermore, all the word pairs are propagated 
even if only one word pair changes. An element is considered active if 
the value vector associated with its output is active. 


Il. CONCURRENCY IN FAULT SIMULATION 


At least three types of concurrencies exist in fault simulation of 
logic circuits. The first type of concurrency occurs in the actual 
simulated hardware, the second type occurs in the simulation algo- 
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Fig. 1—Value vector configuration. 
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rithm, and the third type occurs in the form of fault activity. The first 
two types of concurrencies also occur in true value simulation of logic 
circuits and have been discussed in an earlier paper.® 

The concurrency occurring in the actual simulated hardware can be 
called logic circuit concurrency. Utilizing this type of concurrency leads 
to distributed processing with identical processors performing identical 
tasks. The architecture for true-value simulation developed by Lev- 
endel et al.° and Denneau et al.*"' takes advantage of this type of 
concurrency. 

The concurrency occurring in the simulation algorithm can be called 
algorithm concurrency. This concurrency is indirectly due to the con- 
currency occurring in the actual simulated hardware. Since several 
elements can be simultaneously active and a sequence of steps is to be 
performed for each active element, they can be processed in a pipeline 
fashion. Utilizing this type of concurrency leads to functional parti- 
tioning of tasks among several processors and a pipelined architecture. 
The architecture for true-value simulation developed by Barto and 
Szygenda” and Abramovici et al.!° takes advantage of this type of 
concurrency. 

For efficient fault simulation, a number of faults are simulated 
simultaneously in software-based simulators. This leads to fault activ- 
ity concurrency, which can be utilized in special-purpose hardware for 
fault simulation. 

This paper extends the architecture for true-value simulation de- 
scribed in Ref. 5 to fault simulation using the parallel method. The 
architecture takes advantage of the parallelism due to logic circuit 
concurrency and fault activity concurrency. The main difference be- 
tween true-value simulation and fault simulation is in the algorithm 
executed by the individual processing units. 


IV. SPECIAL-PURPOSE ARCHITECTURE 


The simulator consists of one master and a number of slaves 
(processors) interconnected by a communication structure (Fig. 2). 
The communication structure is used as a medium for transferring 
data between the slaves and between the slaves and the master. The 
communication structure can be either a time-shared parallel bus or a 
cross-point matrix. The circuit to be simulated is partitioned into 
subcircuits and each subcircuit is simulated in a separate processor. 
Subcircuits in different processors become active as signal values 
proceed from the primary inputs to primary outputs. As simulation 
progresses, data are transferred between subcircuits as the logic values 
on the signal connections between two subcircuits change. These data 
are transported across the communication structure. Typical data sent 
across the data path consist of element information and changed value 
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Fig. 2—Multiprocessor-based digital logic simulator. 


vectors. The architecture of the simulator has been described in detail 
in the previous paper.® A summary of the architecture description is 
presented in Appendix A for completeness. 

The overall architecture for true-value simulation is applicable to 
fault simulation since the algorithms for the two types of simulations 
are the same except that fault simulation requires: 

1. Carrying of faulty signal values in addition to true-value signals 
(more than one word may be used to allow representation of a larger 
number of faulty signals) , 

2. Fault injection using masks 

3. Multiple passes if the fault set is large. 

The processing time per pass will be higher for fault simulation due 
to the extra processing required for injecting faults and manipulating 
multiple-word pairs in a value vector. The latter requirement applies 
only when the value vector contains more than one word pair. As far 
as communication between the processors is concerned, the data 
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transferred between the processors are greater than those transferred 
for true-value simulation, since the signal values require one or more 
complete words in addition to a word for the element number. 


V. ARCHITECTURE EVALUATION 


The processing time per simulation cycle and the communication 
times using a time-shared parallel bus and a cross-point matrix are 
estimated first. These results are then used for selecting the commu- 
nication structure. A multiple-bus communication structure is also 
discussed in this section. 


5.1 Processing time t, 


The average number of active elements per processor during a 
simulation cycle for true-value simulation is given by kN/n, where N 
is the average number of active elements per simulation cycle during 
true-value simulation, n is the number of processors in the multi- 
processor simulator, and k is the average unbalance factor representing 
the extra active elements per processor during a simulation cycle due 
to nonideal partitioning.” Ideal partitioning will cause an equal number 
of elements to be active in all the processors during all simulation 
cycles. However, because of some imbalance such as the fanout of all 
active elements during one simulation cycle not feeding equally into 
all the processors, some processors will have more active elements 
than the others during some simulation cycles. The average number 
of active elements per processor during a fault simulation cycle can be 
written as RN;/n, where N; is the average number of active elements 
per simulation cycle in one pass during fault simulation. The value of 
N; is expected to be larger than N. Indeed, experimental runs on the 
Logic Analyzer for Maintenance Planning (LAMP) simulator™ show 
that the overall activity (total number of active elements) during 
parallel fault simulation increases by a factor of about 2 for 16 faults 
per pass and by about 3.5 for 1024 faults per pass compared to true- 
value simulation. These results are averaged over several runs made 
using both combinational and sequential circuits with sizes ranging 
from 420 gates to 1912 gates. The number of faults simulated ranged 
from 1020 faults for the 420-gate circuit to 5046 faults for the 1912- 
gate circuit. The LAMP simulator is based on the deductive method. 
A mapping mechanism from deductive simulation activity to parallel 
simulation activity was implemented to predict the results for parallel 
simulation. 

During one simulation cycle the following major operations occur 
in the given order: 

1. Using the current list of events, list L, of the timing wheel (Fig. 
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11 in Appendix B), update, and find fanout of the elements whose 
outputs changed during the current simulation cycle. 

2. Using the next list Li4: of the timing wheel, prepare external 
events to be transmitted to other processors for the next time interval. 

3. From data in the Input FIFO Buffer (IFB) (sent by other proces- 
sors), update and find fanout of the elements active during the current 
simulation cycle. 

4, Evaluate the fanout of active elements (this includes fault injec- 
tion). 

5. Schedule on timing wheel elements whose output changes. 
The detailed algorithm is given in Appendix C. 

Let t, be the time required to process one active element. The 
average processing time per processor during one simulation cycle is 
then given by: 


Assume a microprogrammable microprocessor (e.g., Am2900 series) 
for each slave unit processing unit (PU) and the following operation 
times (150 ns cycle time): memory-to-memory move = 1.2 us, memory- 
to-register move = 0.6 us, and memory-to-memory logical AND/OR 
operation = 1.5 us. Using these major microprocessor operations, the 
execution time for each operation in the parallel simulation algorithm 
described in Appendix C can be estimated. For example, obtaining 
each fanout of an updated element (after the fanout list has been 
accessed) takes one memory-to-register instruction and moving the 
element number to the Output FIFO Buffer (OFB) takes one memory- 
to-memory instruction. The processing times per active element for 
the various steps of the algorithm are shown in Table I, where f, is the 
average fan-in, f, is the average fan-out, and L, is the number of word 
pairs in the value vector. 

The total processing time per element during one simulation cycle 
is the sum of all the expressions in Table I: 


(2.4 + 7.2L,) 
aa ae os 


Taking typical values of f; and f, to be 2 and an unbalance factor of 
k = 1.1, the processing time for a simulation cycle becomes: 


ta = 9.6 + 8.7L + 3f. + 3fLp 


ty = (15.8 + 12.2L,) x us. (1) 


5.2 Communication time t, 


The value of t, will depend on the type of communication structure. 
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Table |—Simulation cycle timings for parallel simulation 


Step Expression (ys) 
1.8 + 2.4f, 
Update data from timing wheel —— 
o — 1)(6.6 + 3.6 
Prepare external events for next time interval a ann 
o — 1)(1.2 + 3f, + 3.6 
Update data from IFB ST a 
3.6 
Evaluate schedule 24+ 1.5L, + 3fL, + (22) 


ee ELEMENTS 
/ 





Fig. 3—An element string. 


The two types of communication structures discussed in Ref. 5, namely 
the time-shared parallel bus and the cross-point matrix, will be con- 
sidered here also. The partitioning algorithm discussed in the previous 
paper” partitions a circuit along its depth rather than its breadth. 
Since the signals in a circuit propagate in parallel, this places concur- 
rent activities in different blocks. The same partitioning algorithm 
will be assumed here, since during fault simulation the signals still 
propagate in the same manner. The logic circuit to be simulated is 
partitioned into elements strings (see Fig. 3). The average number of 
communication events generated by one active element during a 
simulation cycle that have to be sent over the communication structure 
is: 


ye let (e- Wh - VI 


c 


where f, is the average fanout and c is the average number of elements 
in one element string. The typical value of f, can be taken as 2, and 
for large circuits c is expected to be greater than 10. For f, = 2 and 
c = 10, e will be equal to 1.1. 


5.2.1 Time-shared parallel bus 


The total communication time during a simulation cycle for true- 
value simulation is given by’: 
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te(bus) = (n + 200) Ne ns 


This expression assumes one word of data to be transferred per active 
element. For fault simulation with w words of data to be transferred 
per active element the expression becomes: 


te(bus) = (nn + 200) Neew ns. 
The number of words to be transferred is given by: 
w=1+ 2L,, 


where L, is the number of word pairs per value vector used in 
simulation. One word is required to carry the element number and the 
2L, words carry the value vector. Taking the value of e = 1.1: 


cious) = (1.ln + 220)(1 + 2D) Nt ns. (2) 


5.2.2 Cross-point matrix 

In a cross-point matrix-based communication structure several 
processors can be simultaneously sending data to other processors. 
The total communication time during a simulation cycle for true value 
simulation is given by’: 
+ 50] ns, 


200 Nke 

to(matrix) = 
n 

where j is the number of events for which the destination processor is 
found busy, i.e., the destination processor is communicating with some 
other processor. Once again this expression assumes one word of data 
transferred per active element. For w words of data to be transferred, 
the expression for fault simulation becomes: 


eee + 50jw ns. 


te(matrix) = 
For k = 1.1,e = 1.1, w= 1+ 2L,, andj = 0.1N;/n (the channel is 
found busy for 10 percent of the transfer requests), the above expres- 
sion can be rewritten as: 


N, 
teumatrix) = (247 + 494L,) a ns. (3) 


5.3 Choice of communication structure 


The expressions for the processing time per simulation cycle per 
active element and the communication times per simulation cycle per 
active element for the bus and matrix structures are plotted in Fig. 4. 
The expressions are plotted for value vector length of one word (16 
bits/word), i.e., Lp = 1. 
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Fig. 4—Variation in processing and communication time. 


The curves for the processing time and bus communication time for 
the parallel bus intersect at n = 36. The processing time is greater 
than the bus communication time for n < 36. Thus the processing 
time is the bottleneck. The processing time decreases as the value of 
n increases. For n > 36 the bus communication time becomes larger 
than the processing time and the bus communication time becomes 
the bottleneck. Therefore, using more processors than n = 36 will not 
speed up the simulation. For optimum performance n = 36, and the 
length of the simulation cycle per active element becomes tm = 0.78 
us. Based on the parallel simulation algorithm given in Appendix C, 
the actual simulation cycle length per active element for a single 
processor can be estimated as t; = 24 ps. The multiprocessor fault 
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simulator with a bus-based communication structure provides a speed- 
up of 31 over the traditional single-processor logic fault simulator. 

For further speedup a faster communication structure must be used. 
Figure 4 also shows the curve for the matrix communication time. 
This curve does not intersect with the curve for the processing time, 
and the communication time is always less than the processing time. 
The communication time will therefore never be a bottleneck. More 
processors can be added to speed up the simulator. For example, for 
n = 100 the speedup compared to the traditional single-processor logic 
fault simulator is 86 and for n = 256 the speedup is 220. The speedup 
of simulation is expected to be greater than two orders of magnitude 
for n > 120. 


5.4 Multiple-bus communication structure 


The results of the previous section show that for a given number of 
faults per pass, the time-shared parallel bus is useful only for up to a 
fixed value of n. For further speedup the cross-point matrix has to be 
used. However, the cross-point is not used up to its maximum capa- 
bility. For example, at 16 faults per pass and n = 100, the simulation 
cycle length is 280 N; ns while the communication cycle length is 7.4.N; 
ns, i.e., the communication structure is used less than 3 percent of the 
time. A communication structure that is slower and cheaper than the 
cross-point matrix might prove more cost-effective. This will be true 
especially since the control for the cross-point is very complex and 
thus expensive. 

A communication structure that provides a capacity in between that 
of the time-shared parallel bus and the cross-point matrix is the 
multiple-bus structure. It consists of a bus arbitrator and several 
parallel buses. The configuration of the multiple-bus structure and its 
interface to the Output Data Sequencers (ODSs) and Input Data 
Sequencers (IDSs) of the slave units are given in Fig. 5. When the 
ODS needs to send data, it sets the Request To Send (RTS) line high 
and puts the destination address on the address lines. The requesting 
ODS keeps the RTS line high until granted a bus. The bus arbitrator 
grants it the use of the communication medium when it finds an 
unused bus and determines that the requested destination is not busy. 
The bus arbitrator grants the bus by setting the Bus Grant line high. 
The data are switched through the bus selector switch to the available 
bus. At the other end of the bus there is a line selector from which the 
data are sent to the destination processor. The ODS sends out all the 
data present in its Output FIFO Buffer (OFB). The data received by 
the IDS of the destination unit are put in its Input FIFO Buffer (IFB). 
The ODS then sets the RTS line low. This releases the bus, which is 
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Fig. 5—Configuration of multiple-bus structure. 


then granted by the bus control to another requesting slave or the 
master. All units have equal priority. The ODS will set the Request 
To Send line high again if it gets more data to transfer in the OFB. 

The data sent out to a slave unit from another slave unit or the 
master consist of element information and changed value vectors. The 
data sent to a master consist of the address of the sending slave, 
element number (primary output or monitored point), and value 
vector. A separate line Request to Send to Master (RTSM) is used to 
address the master. When the destination is the master, the address 
lines from the ODS contain the sending slave unit address. This 
address together with the element number and value vector is stored 
in the master IFB by the master IDS. 

To obtain an expression for the communication time for the multi- 
ple-bus structure, consider first of all the bus structure with only a 
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single bus. For this case, the communication time will consist of the 
same components as that for time-shared parallel bus’: 


tc(mbus) = (torg + tas + tda + tyr) (1 ae 2L,) (Nee, 


where fpr, is the bus request and grant time, ta, is the address and data 
setup time, tg, is the data acknowledge, and f,, is the bus release time. 
For the multiple-bus structure the bus request and grant time, tprg, will 
be greater than in the parallel bus since extra checking has to be done 
before a bus is granted. The bus arbitrator will have to determine if a 
bus is available and if available then it has to further determine if the 
requested destination is busy. Assuming these extra actions double the 
time required for the bus request and grant time and the other times 
remain the same: tp, = 200 ns, tas = n ns, taa = 50 ns, ty, = 50 ns, and 
e = 1.1. Each transaction across the multiple-bus has to wait for a bus 
to be granted. As more buses are added, the transactions can occur in 
parallel. Assuming the number of parallel buses is much smaller than 
the number of processors, the probability of the destination processor 
being busy will be small. The decrease in the total communication 
time will then be proportional to the number of buses. The expression 
for the total communication time for the multiple-bus communication 
structure becomes: 


1.1n + 330)(1 + 2L,)N, 
te(mbus) = Goin + Oh EN (4) 


where 6 is the number of parallel buses. 

Let n, be the number of processors at the optimum operation point, 
where t) = tcmbus). If the simulator operates with the number of 
processors not equal to n,, then the speed of simulation will be lower. 
An expression for n, can be derived by equating eqs. (1) and (4): 


0.49L, + 0.64 ig 
o=—l ob Ol1 + pee ee 
No = 50 15 ( ( yi 1 )) 


This expression is plotted for various values of b with L, = 1 in Fig. 
6. It can be seen that higher performance is available with the multiple- 
bus structure compared to the single time-shared parallel bus for 
b= 2. 

The implementation of the multiple-bus structure is expected to be 
less expensive than that of the cross-point matrix. The complexity of 
the communication structures and thus their cost is proportional to 
the number of switch points. For the cross-point matrix, the number 
of cross-points is proportional to the square of the inputs, i.e., n?. For 
the multiple-bus structure two points have to be connected to establish 
a connection and thus the number of switch points is 2bn. The cost of 
the multiple-bus structure will be less than that of the cross-point 
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Fig. 6—Variation in the optimum number of processors with the number of parallel 
buses. 


matrix as long as 2b < n. As we saw in the analysis done earlier this 
will always be the case. For b = 5 and n, = 105, the cost of the 
multiple-bus structure will be an order of magnitude lower than that 
of the cross-point matrix. Furthermore, it must also be noted that 
physical switching is not necessary in the case of the multiple-bus 
structure; it is cheaper to use logic enables for connecting an ODS and 
an IDS to a bus. This will result in even lower cost when compared 
with the cross-point matrix. 


VI. EFFECT OF NUMBER OF FAULTS PER PASS 


When simulating a large number of faults per pass in parallel 
simulation, several pairs of words carrying the true and faulty values 
(Z,) must be manipulated per active element. All the faulty values in 
the words can be considered together as a vector. All the faulty values 
are evaluated even if only one faulty value is active. 

The behavior of the multiprocessor fault simulator, using the par- 
allel fault simulation algorithm, will be investigated for variations in 
the number of faults simulated per pass. Comparisons will be made 
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between 16, 32,64, 256, and 1024 faults per pass, assuming a processor 
word length of 16 in all the cases. 

Only the time-shared parallel bus and the cross-point matrix are 
discussed. The communication structure with multiple buses is not 
considered since the results for the time-shared parallel bus will apply, 
except for a scaling factor. 


6.1 Simulator with time-shared parallel bus 


The expressions for processing time per simulation cycle, t,, and the 
parallel bus communication time per simulation cycle, t-@us), derived 
earlier in eqs. (1) and (2) are used to analyze the effects of variations 
in the number of faults per pass. It was seen earlier that the optimum 
operation point for the simulator occurs when the processing time and 
communication time are equal, i.e., t = tus). An expression for the 
number of processors required to meet this condition for a given length 
of value vector (Z,) can be derived by equating the expressions for t, 
and te(pus). Let n,. be the number of processors at the optimum operation 
point and ¢, be the length of the processing and communication cycles 
at the optimum operation point. Values of n smaller than the optimum 
no cause the processing to be a bottleneck, while larger values of n 
cause communication to be a bottleneck. Equating eqs. (1) and (2) 
yields the following expression for n., the number of processors re- 
quired for optimum operation: 


0.5 
ng = ~100 + 100{ SttLe + 2-44 
QL, + 1 


Let a be the number of faults per pass. Then a = 162, assuming 16 
bits per word. The above expression for n, can be rewritten as a 
function of the number of the faults a, as: 


0.19a + 2.44\°° 
9a ‘ (5) 


Noa) = —100 + 100( 0125a +1 


For n = n,(a), the processing time per simulation cycle and the bus 
communication time per simulation cycle both reduce to: 


Na) 
n(a) 


For a simulator with optimum number of processors, n,(a), the total 
time required to simulate a fixed number of faults is obtained by 
multiplying the time required for processing one active element (t,(a)/ 
N,(a)) by the total number of active elements during the simulation 
(No(a)): 


to(a) = (15.8 + 0.76a) 





(6) 
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to(a) 
N; (a) 


The total number of active elements during simulation is given by: 





T(a) = Nr(a). 


Nr(a) = Na) X (number of simulation cycles per pass) 
Xx (number of passes). 
Substituting the expression for t,(a) given in eq. (6): 


T(a) = (15.8 + 0.760) 0. (7) 
No(a) 

Define the simulation time ratio as the ratio of the total time required 
to simulate a set of faults with a faults per pass to the total time 
required to simulate the given set of faults with 16 faults per pass, Le., 
T(a)/T(16). Using a value of n,(16) = 36 (eq. 5), the expression for 
the simulation time ratio is given by: 


T(a) _ (20.3 + 0.98a) Nr(a) 
T(16) no(a) Ny(16)" 


The values of the simulation ratio are first calculated theoretically 
and then compared with the experimental results. 








(8) 


6.1.1 Theoretical simulation time ratio 


Let simulation activity, Ny(a), refer to the number of active elements. 
during all passes of a simulation. The simulation activity can be 
expected to be inversely proportional to the number of faults per pass: 


Nr(a) _ 16 
Ny(16) a 


For example, the expected simulation activity at 32 faults per pass 
will be half the simulation activity of 16 faults per pass since the 
number of passes needed will be halved. The theoretical expression 
for the simulation time ratio becomes: 


T(a) _ (20.3 + 0.98a) 16 
T(16) No(a) a 


Note that the theoretical simulation time ratio is independent of the 
simulation activity. Table II gives the variation of the optimum 
number of processors, n., and the variation of the simulation time 
ratio as a function of the number of faults per pass, a. The theoretical 
results show that the simulation time ratio, and thus the total simu- 
lation time, decreases as the number of faults per pass increases. 
Also, it is interesting to note that for 16 faults per pass the operation 





, (9) 


FAULT SIMULATION = 3123 


Table !I—Theoretical simulation 
time ratio 


Optimum Simulation 
Number of Time Ratio, 


Faults per Processors, T(a) 
Pass, a No T(16) 
16 36 1.0 
32 32 0.81 
64 29 0.72 
256 26 0.65 
1024 25 0.64 
Table III—Experimental simulation time 
ratio 


Optimum Simulation 
Simulation Number of Time Ratio, 


Faults per Activity, | Processors, T(a) 
Pass, a Nr(a) No T(16) 
16 17586 36 1.0 
32 9374 32 0.86 
64 5204 29 0.84 
256 1787 26 1.06 
1024 689 25 1.6 


peaks at 36 processors, while for 1024 faults per pass the operation 
peaks at 25 processors. As the number faults per pass increases, the 
point of optimum operation occurs for a slightly smaller number of 
processors. This is because the time required to transfer the increased 
data is more than the time required to process the increased data. 


6.1.2 Experimental simulation time ratio 


The values of Ny(a) averaged over several experimental runs made 
on the LAMP simulator’? (with a mapping algorithm from deductive 
simulation to parallel simulation as discussed in Section 5.1) are given 
in Table III. Also shown in Table III are values for the simulation 
time ratio derived from eq. (8). 

The experimental simulation time ratio, T(a)/T(16), is plotted 
together with the theoretical simulation time ratio in Fig. 7. 

It is interesting to note that as the number of faults per pass 
increases, the experimental simulation time ratio falls below 1.0 ini- 
tially, i.e., the simulation speeds up. After a certain point, the simu- 
lation time ratio then starts increasing and goes above 1.0. The fastest 
simulator is a 64 faults per pass simulator with 29 processors. On the 
other hand, the theoretical simulation time ratio always stays below 
1.0 and keeps decreasing as the number of faults per pass increases. 
The difference in the experimental and theoretical results can be 
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Fig. 7—Experimental and theoretical simulation time ratio for variation in number 
of faults per pass—a time-shared parallel bus. 


explained by examining the variation in simulation activity as the 
number of faults per pass changes. 

The curve for the theoretical simulation time ratio in Fig. 7 shows 
that the simulation speed increases as the number of faults per pass 
increases. This is as expected, since increasing the number of faults 
per pass decreases the number of passes and thus the fanout search 
and related processing time. In practice, however, there is extra 
simulation activity due to longer value vectors and this tends to 
increase the processing time. As more faults are simulated per vector, 
i.e., as the value of a gets larger, the simulation activity during the 
simulation will be higher than the theoretical. For example, as shown 
earlier, the theoretical activity at 32 faults per pass will be half the 
simulation activity at 16 faults per pass. However, the experimental 
simulation activity at 32 faults per pass will be more than the expected 
half. This is because the active faults in two value vectors at 16 faults 
per pass will not always directly map into one value vector at 32 faults 
per pass. Any active fault in the value vector will cause simulation 
activity even if the good value does not change. The effect of this is to 
cause extra schedulings. This increase in schedulings can be repre- 
sented by an effective increase in the length of the simulation cycle 
and the number of simulation cycles. The runs made on the LAMP 
simulator show that most of the increase is in the length of the 
simulation cycle (i.e., more computation during the simulation cycle). 
The expected simulation activity and the actual simulation activity 
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obtained from runs made using the LAMP simulator are given in 
Table IV. 

For 32 faults per pass, the simulation activity is only 1.07 times that 
theoretically expected. Thus the increase in processing time due to 
this extra activity is not substantial and the overall speed of simulation 
is higher due to the greater savings in the fanout search processing. 
For 1024 faults per pass, the simulation activity is 2.51 times that 
theoretically expected. In this case, the increase in processing time 
due to the extra activity is substantial compared to the savings 
obtained in the fanout search processing. This results in lowering the 
overall speed of simulation when compared with 16 faults per pass. 

In summary, for the multiprocessor fault simulator with a parallel- 
bus-based communication structure, the fastest simulation speed oc- 
curs for 64 faults per pass and 29 processors. Note, however, that the 
number of processors required for 64 faults per pass is greater than 
the number of processors for 1024 faults per pass. Decreasing the 
number of processors for 64 faults per pass to 25 yields the total 
simulation time of 13,413 us. This still favors the 64 fault per pass 
simulator over the 1024 fault per pass simulator. 


6.2 Simulator with cross-point matrix 


The expressions for the processing time per simulation cycle, tp, and 
the cross-point matrix communication time per simulation cycle, 
tc(matrix), derived earlier in eqs. (1) and (3) are applicable to variations 
in the number of faults per pass. The increase in simulation activity 
caused by simulating more faults per pass will increase both the 
processing time and the communication time for the cross-point 
matrix. However, adding one word pair to the value vector will cause 
an increase in the communication time that is only 4 percent of the 
increase in the processing time. Thus, the communication time will 
always be less than the processing time. The cross-point matrix 
provides sufficient communication capacity for the parallel fault sim- 


Table I1V—Effect of multiple 
Passes on simulation activity 


Experimental Theoretical 
Simulation Simulation 


Faults per Activity, Activity, 
Pass, a Nr(a) Nr(a) 
16 17,586 17,586 
32 9374 8793 
64 5204 4396 
256 1787 1099 
1024 689 275 
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ulator and does not cause a communication bottleneck. For faster fault 
simulation, more processors can be added. 

The variation of the simulation time ratio as a function of the 
number of faults per pass will be investigated to obtain the optimum 
number of faults per pass. Since the processing time dominates, the 
total time required to simulate a set of faults with a faults will be 
equal to the total processing time. Using eq. (1): 








T(a) = (15.8 + 0.760). 
The simulation time ratio is given by: 
T(a) Nr(a) 
= (0.56 + 0.02 : 
T(16) (0.56 + 0.027a) N(16) (10) 


The experimental and theoretical simulation time ratios are plotted 
in Fig. 8. As was the case for the time-shared parallel bus, the 
experimental simulation time ratio increases after 64 faults per pass. 
This is because the time required to process the increased simulation 
activity is greater than the time saved in the fanout search overhead 
associated with each pass. For the multiprocessor fault simulator with 
a cross-point-matrix-based communication structure, the optimum 
number of faults per pass is 64. Using a different value for the number 
of faults per pass will decrease the speed of simulation. Note that the 
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Fig. 8—Experimental and theoretical simulation time ratio for variation in number 
of faults per pass—cross-point matrix. 
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number of processors, n, does not affect the simulation time ratio. 
More processors can be added to obtain greater speed. 

When compared with the parallel bus the cross-point matrix pro- 
vides greater speed for any value of n greater than the optimum n for 
the parallel bus. For example, for 64 faults per pass, the cross-point 
matrix can be made faster than the parallel bus by selecting n > 29. 


VII. SUMMARY 


In this paper we presented a special-purpose logic fault simulator 
based on the parallel simulation method. The simulator is expected to 
be two orders of magnitude faster than traditional logic fault simula- 
tors implemented on general-purpose computers. For both the time- 
shared parallel bus and the cross-point matrix, the simulator performs 
the best at 64 faults per pass. Decreasing the number of faults per 
pass slows down the simulation due to fanout search and other 
overhead required for every pass. Increasing the number of faults per 
pass also slows down the simulation due to the increase of simulation 
activity. 

When the parallel bus is used, the power of the simulator can be 
increased over a certain range by increasing the number of slaves. The 
power of the simulator can be further increased by using the cross- 
point matrix. 

The application of the special-purpose simulator to other fault 
simulation methods is being investigated currently. 
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APPENDIX A 
Architecture Description 
A.1 Introduction 


The simulator consists of one master and a multiplicity of slaves 
(processors) interconnected by a communication structure (see Fig. 
2). The communication structure can be either shared or dedicated. 
The circuit to be simulated is partitioned into subcircuits and each 
subcircuit is simulated in a separate processor. The partitioning is 
such that the number of simultaneously active subcircuits (processors) 
is maximum and the number of simultaneously active elements in 
each subcircuit (processor) is minimum while keeping the communi- 
cation from being a bottleneck. 

The circuit to be simulated is initially modeled in the general- 
purpose computer, partitioned and loaded into the slave memories. 
The general-purpose computer performs all functions of input, output, 
and user interactions. The simulation is carried out in the multi- 
processor simulator. 

The simulator can be programmed to output intermediate results to 
the general-purpose computer. It also can be interrupted by the gen- 
eral-purpose computer for intermediate results. The user can ask for 
information about a simulation run while it is in progress (e.g., the 
status of a variable) and make certain run-time decisions like contin- 
uing simulation, applying extra input vectors, or stopping. After sim- 
ulating each vector input, the simulation results and any other user 
requested information are sent to the general-purpose computer. User- 
requested information typically includes output values of elements 
(monitored points) at specific simulated times or under some other 
specified conditions. The general-purpose computer formats this in- 
formation for suitable presentation to the user. 


A.2 Multiprocessor operation 


At the beginning of each simulation cycle the master sends primary 
input values (if any) to the appropriate slaves using the communication 
structure. The master then issues a start signal to the slaves. This 
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signal informs the slaves to start processing for the next simulation 
cycle. During the processing of a simulation cycle a slave unit may 
generate data for the other slaves or the master. The data are sent to 
the destination slave or the master using the communication structure. 
Only data for the subsequent time interval are transferred between 
the slaves to reduce the amount of information sent over the commu- 
nication structure, thus minimizing the communication overhead. 
Therefore, the scheduled time is not sent. 

Each slave informs the master when it has finished processing and 
transferring data for the current simulation cycle. When all slaves 
have informed the master about their completion of processing for the 
current simulated time interval, and also the master has finished 
transferring any primary input values scheduled for the next simulated 
time interval to the slaves, the master issues a start signal to the slaves 
for the next simulation cycle. 

There are two signal lines between each slave unit and the master. 
The master signals the slaves using a START signal and the slaves 
signal the master using the DONE line. The START line from the 
master initiates processing for the next simulation cycle. The DONE 
line will become one when all the slaves have finished processing for 
the current simulation cycle. 


A.3 Slave unit 


Each slave unit consists of a processing unit (PU), an input FIFO 
buffer (IFB), an output FIFO buffer (OFB), an input data sequencer 
(IDS), and an output data sequencer (ODS). 

The slave unit PUs perform the actual fault simulation. The PU 
stores any data it has for other PUs or the master in the OFB. The 
ODS makes a request for the communication structure if there are 
any data to be transferred from the OFB. The ODS of the slave, if 
granted the use of the communication structure, takes data from the 
OFB and sends it over the communication structure to other slaves or 
the master. The data are received by the IDS of the destination slave 
or the master. Any data received by an IDS are put in its IFB. End of 
Data (EOD) flags are used to separate data streams since a PU can be 
writing new data to the OFB before its ODS has finished transferring 
current data, and similarly, an IDS can be receiving new data in the 
IFB before its PU has finished reading current data. 

The slave unit PU operation can be described in terms of two 
essentially concurrent processes, namely the simulation cycle (execu- 
tion of simulation algorithm) and the communication cycle (commu- 
nication of events). Data generated during a simulation cycle are 
transferred as they are generated in a corresponding communication 
cycle (see Fig. 5). The algorithm executed by the Slave Unit PU is 
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similar to the traditional parallel simulation algorithm used on gen- 
eral-purpose computers except that it is modified to operate on a data 
structure distributed over several processors (see Appendix C). 

A communication cycle is the period in between two START com- 
mands issued from the master. This cycle is phased with respect to 
the simulation cycle as shown in Fig. 9. 


A.4 Master processor 


The master processor is the interface between the general-purpose 
computer and the simulator. Its main functions are to keep track of 
simulated time, keep the slaves in synchronism, supply the slaves with 
primary input values, and gather the primary output values from the 
slaves. The configuration of the master is similar to that of a slave 
unit and is shown in Fig. 10. It consists of a central processing unit 
(CPU) with some local memory, an IFB, an OFB, an IDS, and an 
ODS. 


A.5 Communication structure 


The communication structure is used as a medium for transferring 
data between the slaves and between the slaves and the master. Either 
a shared or a dedicated structure can be used for the multiprocessor 
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Fig. 10—Master configuration. 


simulator. The details of a communication structure based on time- 
shared parallel bus and the cross-point matrix are discussed in detail 
in Ref. 5. 


APPENDIX B 
Data Structure for Parallel Method 


The following data tables are used by each PU for its operation (see 
also Fig. 11): 

1. Element table—This table contains the interconnection data and 
signal values for the circuit. For each element it contains the value 
vector, type, delay, fan-in list pointer and corresponding fan-in list, 
internal fanout list pointer and fanout list, external fanout list pointer 
and external fanout list. The value vector consists of a word pair. The 
first bit of each word of the word pair represents the good machine, 
while the remaining bits represent faulty machines. Three values can 
be represented: 0, 1, and X. If multiword parallel simulation is required, 
then additional word pairs are used to represent more faulty machines. 
The internal fanout list pointer and corresponding fanout lists give 
the fanout that remains in the subcircuit. The external fanout list 
pointer and corresponding fanout lists give fanout that goes to subcir- 
cuits located in other slaves. An element may have only internal 
fanout, only external fanout, or both internal and external fanout. 
Note that storing the external fanout takes up more space than storing 
an internal fanout, since both the destination processor address and 
element index have to be stored for the external fanout. 

2. Activity list—This list is used to keep track of active elements 
during a simulation time interval. These elements are to be evaluated. 
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Fig. 11—Data tables for PU operation (parallel simulation). 


3. Timing wheel—This data area contains the events that are 
scheduled in the future. A large amount of work has been done in this 
area,’>'® 


APPENDIX C 
Parallel Simulation Algorithm 


1. Update data from timing wheel 


for each event in list L, of timing wheel 
begin 
update value vector pointer in Element Table; 
for each fanout f of updated element 
if activity flag of f not set { 
set activity flag; 
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put f on activity list; 
end 
deallocate space on Timing Wheel; 
2. Prepare external events for next time interval 


for each event in list Li4; of Timing Wheel 
begin 
if element has external fanout { 
for each external fanout { 
move destination processor address to OFB; 
move element number to OFB; 
for each word of value vector associated with event 
move word to OFB; 


if element does not have internal fanout also 
deallocate space on Timing Wheel; 
} 


end 
3. Update data from IFB 


enable interrupts from IFB; 
for each interrupt received from IFB 
begin 
if event is not EOD flag { 
for each word of value vector associated with event 
move word to Element Table; 
for each fanout f of updated element 
if activity flag of f not set { 
set activity flag; 
put f on activity list; 
} 
} 
else { 
disable interrupts from IFB; 
reset activity flags in Element Table; 
go to step 4 (Evaluate/Schedule); 
end 
(The EOD flag signifies end of data present in the IFB for the 
current simulation cycle. The EOD flag is loaded by the IDS 
when it receives a START signal from the master.) 
4. Evaluate/schedule 


for each entry in activity list 
begin 
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inject faults, if any, on inputs; 
for each set of words associated with input value vectors 
of element 
begin 
calculate corresponding output word; 
if output word changed from previous value 
set change flag; 
end 
inject faults, if any, on outputs; 
if change flag set or faults injected { 
schedule event on Timing wheel; 
if (element delay is 1 && element has external fanout) 
for each external fanout { 
move destination processor address to OFB; 
move element number to OFB; 
for each word of value vector associated with event 
move word to OFB; 


end 
5. End cycle 


store EOD flag in OFB; 
increment f; 
go to STEP 1. 


APPENDIX D 
Glossary 


BIU—bus interface unit 
CPU—central processing unit (part of a master unit) 
KOD—end of data (flag) 
IDS—input data sequencer 
IF B—input first in first out (FIFO) buffer 
L,—number of word pairs used in parallel simulation 
LAMP—logic analyzer for maintenance planning 
LSI—large-scale integration 
L,—current list of timing wheel 
N—average number of active elements per simulation cycle in 
true value simulation 
Ny—total number of active elements during all passes of a simu- 
lation 
Nr—average number of active elements per simulation cycle in 
fault simulation with 16 faults per pass 
ODS—output data sequencer 
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OFB—output FIFO buffer 
PU—processing unit (part of a slave unit) 
T—time required to simulate a given set of faults 
c—average number of elements in an element string 
fi—average fan-in of an element 
fo.—average fanout of an element 
jJ—number of events for which the channel is found busy in 
cross-point matrix 
k—imbalance factor due to nonideal partitioning 
n—number of processors in the multiprocessor simulator 
No—number of processors required for optimum operation of 
multiprocessor simulator 
t,—length of simulation cycle for a single processor simulator 
t,—time required to process one active element 
tpr—bus release time for a parallel bus structure 
tprg—bus request and grant time for a parallel bus structure 
t-—total communication time during one simulation cycle 
te(busy —total communication time during one simulation cycle for a 
single parallel bus structure 
te(matrixy —total communication time during one simulation cycle for a 
matrix structure 
to(mbusy —total communication time during one simulation cycle for a 
multiple parallel bus structure 
taa—data acknowledge time for a parallel bus structure 
tg,—address and data setup time for a parallel bus structure 
tm—length of simulation cycle for a multiprocessor simulator 
to— average processing time, t,, for number of processors n = np 
tp—average processing time per processor during one simulation 
cycle, where average processing time consists of the time 
required to process all active elements and schedule resulting 
events 
VLSI—very large-scale integration 
w—number of words to be transferred across the communication 
structure for one active event 
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Two New Kinds of Biased Search Trees 
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In this paper, we introduce two new kinds of biased search trees: biased, a, 
b trees and pseudo-weight-balanced trees. A biased search tree is a data 
structure for storing a sorted set in which the access time for an item depends 
on its estimated access frequency in such a way that the average access time 
is small. Bent, Sleator, and Tarjan were the first to describe classes of biased 
search trees that are easy to update; such trees have applications not only in 
efficient table storage but also in various network optimization algorithms. 
Our biased a, b trees generalize the biased 2, b trees of Bent, Sleator, and 
Tarjan. They provide a biased generalization of B-trees and are suitable for 
use in paged external memory, whereas previous kinds of biased trees are 
suitable for internal memory. Our pseudo-weight-balanced trees are a biased 
version of weight-balanced trees much simpler than Bent’s version. Weight 
balance is the natural kind of balance to use in designing biased trees; pseudo- 
weight-balanced trees are especially easy to implement and analyze. 


l. INTRODUCTION 


The following problem, which we shall call the dictionary problem, 
occurs frequently in computer science. Given a totally ordered universe 
U, we wish to maintain one or more subsets of U under the following 
operations, where R and S denote any subsets of U and i denotes any 
item in U: 

access (i, S)—If item 7 is in S, return a pointer to its location. 
Otherwise, return a special null pointer. 


* Research done partly while a summer employee of Bell Laboratories and 
partly while a graduate student supported by Air Force grant AFOSR-80-042. 
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insert (1, S)—Insert i in S, assuming it is not previously there. 

delete (i, S)—Delete i from S. 

join (R, S) (two-way join)—Return the set consisting of the union 
of R and S, assuming that every item in R precedes every item in S. 
This operation destroys R and S, and can be regarded as a concaten- 
ation of R and S. 

split (7, S)—Split S into three sets L, J, and R, where L and R are 
the sets of items strictly smaller and strictly larger than i, respectively, 
and IJ = {i} if iis in S (three-way split), J = @ if i is not in S (two-way 
split). 

One way to solve the dictionary problem is to store the items of 
each set in the external nodes of a search tree in left-to-right order, 
one item per external node. To guide the operations, the search tree 
also contains auxiliary items, called keys, in the internal nodes. The 
worst-case access time in a search tree is proportional to the depth of 
the tree. By imposing any one of a number of well-known balance 
conditions on the tree, we can guarantee that its depth is O(log n), 
where n is the number of items it contains. Such a balance condition 
can be maintained during update operations by performing appropriate 
local rearrangements of the tree. With balanced search trees, each of 
the dictionary operations has an O(log n) time bound. Examples of 
balanced trees include height-balanced (AVL) trees,’ 2, 3 trees,” B- 
trees,® and trees of bounded balance.* 

In many applications of search trees the access frequencies of 
different items are different, and we would like our data structure to 
take this into account. To deal with this problem formally we assume 
that each item i has a known weight w; providing an estimate of the 
access frequency. The biased dictionary problem is that of implement- 
ing the dictionary operations so that operations on heavier items are 
faster than those on lighter items. In particular, when representing a 
set S as a search tree, we wish to bias the tree so as to approximately 
minimize its total weighted depth )\i.s wid;, where d; is the depth of 
the external node containing item i, while preserving the ability to 
update the tree rapidly. In addition to the five dictionary operations, 
we allow the following operation for changing the weight of an item: 

reweight (i, w, S)—Redefine the weight of item i in set S to be w. 

The following theorem, due to Shannon, gives a lower bound on the 
total weighted depth of a search tree: 


Theorem 1:° If T is a search tree for a set S and every node of T has no 
more than b children, then 


i W 
Y wd; = > as log, a) 
ieS is W Wj 


where W = iis w; is the total weight of the items in S. 
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In light of Theorem 1, our goal is to devise classes of search trees 
that are easy to update and have d; = O(log, W/w,) for every item 1. 
We call O(log, W/w;) the ideal access time of item 1. 

Bent, Sleator, and Tarjan®*® have devised several kinds of such 
biased search trees. Our work is an extension of theirs. A thorough 
discussion of previous work by others on the biased dictionary problem 
may be found in Ref. 8. 

In their running time analyses, Bent, Sleator, and Tarjan used a 
technique called amortization, which we shall also use. The idea of 
amortization is to average the running time of individual operations 
over a (worst-case) sequence of operations. As a tool in deriving 
amortized time bounds we use credits. A credit will pay for one unit of 
computing time. To each operation we allocate a certain number of 
credits, called the credit time or amortized time of the operation. If a 
given operation does not need all its credits, we can save them for use 
in later operations; if an operation needs more than its share of credits 
we can use those previously saved. In any analysis using credits, the 
objective is to prove that an arbitrary sequence of operations can be 
performed without running out of credits. 

Three points about amortization using credits are worth making. 
First, credits are a way of charging earlier operations for later ones. If 
a credit analysis is successful, we can assert that any sequence of 
operations requires an amount of actual time that is at most a constant 
multiplied by the sum of the credit times of the individual operations; 
slow operations are only possible if there are corresponding earlier 
fast ones. Second, although the word “average” appears in the descrip- 
tion of the technique, it is not the usual kind of average-case analysis, 
and in particular we make no probabilistic assumptions; we obtain 
worst-case bounds holding for any sequence of operations. Third, 
credits serve as a kind of “potential energy”: we place them in regions 
of search trees that may cause abnormally long update operations. 
This idea may illuminate the credit invariants we define below. 

The remainder of the paper consists of three sections. In Sections 
II and III, we define and analyze biased a, b trees, which generalize 
the biased 2, b trees of Ref. 8 and provide a biased version of B-trees. 
Biased a, b trees are a form of biased tree appropriate for paged 
external memory; earlier forms of biased trees are more appropriate 
for internal memory. In Section IV we introduce pseudo-weight- 
balanced trees, which give a biased version of weight-balanced trees 
much simpler than Bent’s earlier version.® Weight balance is the 
natural kind of balance to use in designing a biased search tree; pseudo- 
weight-balanced trees are especially easy to implement and analyze. 

We shall assume that the reader is familiar with search trees. In 
particular we shall not discuss the use and updating of keys, and we 
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shall draw freely on the results and techniques of Ref. 8. We shall use 
the terminology of Ref. 8 except that we use “external node” in place 
of “leaf”. When appropriate we shall regard a node x of a search tree 
as denoting the entire subtree rooted at x, with the context resolving 
whether a node or a tree is meant. The null node denotes the empty 
search tree. We denote the parent of a node x by p(x); p(x) = null if 
x is a tree root. 


Il. LOCALLY BIASED a, b TREES 


Our first class of biased trees uses height balance to guarantee fast 
access and variable node size to allow easy updating. The class is 
parameterized by two positive integer constants a and b such that 2 < 
a < [b/21. The integer b specifies the maximum allowed number of 
children of an internal node. If an internal node has at least a children, 
we say it is filled; otherwise it is underfilled. (An external node is 
always filled.) Ideally, we would like every node to have at least a 
children; since in our scheme this is impossible to achieve, we allow 
underfilled nodes but treat them specially. 

If x is a node in a search tree, we define its weight w(x) to be the 
sum of the weights of all items in descendants of x. (We use the 
convention that every node is a descendant of itself.) We define the 
rank s(x) of x recursively by s(x) = Llog,w(x)J if x is an external node, 
s(x) = 1+ max{s(y)|p(y) = x} if x is an internal node. A node x is 
minor if x is not a root and either s(x) <s(p(x)) — 1 or x is underfilled. 
All other nodes are major. A locally biased a, b tree is a search tree 
with the following properties (see Fig. 1): 

1. Every internal node has at least two and at most b children; 





Fig. 1—A biased 3, b tree. The numbers in nodes are ranks, and the numbers beside 
nodes are credits for the credit invariant. 
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2. If x is a minor node, any adjacent sibling of x is both major and 
external. When a node x has this property, we say the tree is locally 
biased at x. 

We call two nodes r-compatible if they can be adjacent children of 
a node of rank r in a biased a, b tree. That is, two nodes are r- 
compatible if both have rank at most r — 1 and either at least one has 
rank r — 1 and is external or both have rank r — 1 and at least a 
children each. 

If a = 2 we obtain exactly the biased 2, b trees of Bent, Sleator, and 
Tarjan. Our main new idea is in the definition and handling of 
underfilled nodes. Our first theorem guarantees that a, b trees have 
ideal access time if b is chosen appropriately. 

Lemma 1: Consider any biased a, b tree. If x is an external node, 
a’) < w(x) < a®*1, If x is an internal node with at least one minor 
child, a®*’"! < w(x). If x is an internal node with k children, 
k’a’) = w(x), where k’ = m{k, a}. 

Proof: The first part of the lemma is immediate from the definition of 
rank. The second part follows from the first part and property 2: if x 
has a minor child, it must have another child that is major and 
external. We prove the third part by induction on the height of x. If x 
has a minor child, then k’a**? < a’) < w(x) by the second part 
of the lemma. Otherwise, all children of x are major. Let y be a child 
of x. If y is external, or internal with a minor child, then a°*)-? = 
a*)) < w(y). Otherwise, y has at least a major children, and by the 
induction hypothesis a*’-? = a-a°-? < w(y). Summing over the k 
children of x gives ka’? < w(x). OD 
Lemma 2: If x is an external node in a biased a, b tree of total weight 
W, the depth of x is at most log, W/(w(x)) + 3. 

Proof: Let r be the root of the tree and d the depth of x. Since the 
rank increases by at least one from child to parent, d < s(r) — s(x). 
Lemma 1 implies log,w(x) S s(x) + 1 and log, W = log,w(r) = s(r) — 
2. Combining inequalities gives the lemma. O 
Theorem 2: A biased a, 6 tree has ideal access time, with a constant 
factor proportional to log,b. 

Proof: Immediate from Lemma 2. O 

According to Theorem 2, to minimize the access time in a, b trees, 
we should choose b as small as possible. The requirement b = 2a — 1 
is necessary to allow efficient updating. The best choice of b seems to 
be 2a — 1 or 2a. The choice 6 = 2a allows purely up-down updates 
(see the end of Section III). The choice b = 2a — 1 gives a biased 
version of ordinary B-trees.* Any other choice gives a biased version 
of “hysterical” or “weak” B-trees.°'' Note also that Theorem 2 gives 
a worst-case example, not an amortized bound on the access time. 
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As Ref. 8 shows, all the update operations on search trees can be 
carried out using one or more joins. Our next task is thus to define a 
join algorithm for biased a, b trees. 


Algorithm 1: local join (x, y). Join two locally biased a, b trees with 
roots x and y, assuming that all items in tree x precede all items in 
tree y. 

Case 0O—x = null or y = null. If x = null, return y; if y = null, 
return x. 

Case 1—s(x) = s(y) and x and y are (s(x) + 1)-compatible, or s(x) 
< s(y) and x and y are (s(y) + 1)-compatible. Create a new node with 
x and y as its children and return the new node. 

Case 2—s(x) = s(y) and x and y are not (s(x) + 1) compatible. Let 
u be the rightmost child of x and v the leftmost child of y (see Fig. 
2a). Perform join(u, v), letting w be the root of the resulting tree. If 
s(w) < s(x), construct a node z whose children are those of x not 
including u, the node w, and the children of y not including v. If s(w)= 
s(x), construct a node z whose children are those of x not including u, 
those of w, and those of y not including v (see Fig. 2b). In either case, 
if z has more than 6 children, split it into two nodes z’ and z” with z 
as parent, dividing the old children of z as evenly as possible between 
z’ and 2” (see Fig. 2c). Return z. 

Case 3—s(x) >s(y) and x and y are not (s(x) + 1)-compatible. Let 
u be the rightmost child of x (see Fig. 2d). Perform join (u, y), letting 
v be the root of the resulting tree. If s(v) < s(x), replace u as a child 
of x by v. If s(v) = s(x), replace u as a child of x by the set of children 
of v. If x now has more than b children, split it into two nodes x’ and 
x” with x as parent, dividing the old children of x as evenly as possible 
between x’ and x”. Return x. 

Case 4—s(x) < s(y) and x and y are (s(y) + 1)-compatible. Sym- 
metric to Case 3. 


Theorem 3: Given two biased a, b trees with roots x and y, the join 
algorithm produces a biased a, b tree whose root has rank max{s(x), 
s(y)} or max{s(x), s(y)} +1. In the latter case, the root of the new tree 
has exactly two children. 


Proof: An easy induction on rank shows that the root of the tree 
produced by the join has rank max{s(x), s(y)} or max{s(x), s(y)} +1 
and that in the latter case the root has exactly two children. Further- 
more, it is clear that every internal node in the new tree has at least 
two and at most b children. All that remains is to show that any two 
adjacent siblings in the new tree are allowed to be adjacent by property 
2. We prove this by induction on rank, using the same cases as in the 
algorithm. 

Case 1—Assume without loss of generality that s(x) = s(y). Since 
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s(x) = sly) 


(a) 


s(w) = s(x) 





s(x) > s(y) 


L\ 


(d) 


Fig. 2—Cases of the join algorithm. (a) Situation at the beginning of Case 2. (b) 
Recursive call join (u, v) produces a tree of rank s(x) with root w. (c) Division of an 
overfilled root. (d) Case 3. 


x and y are (s(x) + 1)-compatible, the new tree is locally biased at x 
and y. 

Case 2—For the moment, ignore the split of z if it takes place. Since 
x and y are not (s(x).+ 1)-compatible, neither x nor y is external, and 
u and v exist. Suppose the left sibling of u, say g, is minor. Then uw is 
major and external, which means that the join of u and vu will imme- 
diately terminate in Case 1, and the new right sibling of q will be u. 
The symmetric statement holds for the right sibling of v. Finally, 
suppose w is minor, i.e., s(w) < s(x) — 1 or w has fewer than a 
children. Then both u and v must be minor as children of x and y, 
respectively, and both adjacent siblings of w in the new tree will be 
major and external. Thus the tree before the split has property 2. 
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Splitting z preserves property 2 since both new children of z will have 
at least a children of their own (this is where we use b > 2a — 1) and 
each will have rank s(x). (Property 2 implies that before the split, of 
any two adjacent children of z, at least one has rank s(z) — 1). 

Case 3—Similar to but simpler than Case 2. Case 4 is symmetric.0 

To make our timing analysis as similar as possible to the one in 
Ref. 8 for biased 2, b trees, we shall assume that credits can be divided 
in half, and that half a credit will pay for the work in one call of join, 
not counting the work in the recursive call. We use the following credit 
invariant: A nonroot node x holds s(p(x)) — s(x) — 1 credits, plus an 
additional half credit if x is underfilled. 


Theorem 4: The join algorithm runs in O(|s(x) — s(y)|) amortized 
time. Specifically, carrying out the join while preserving the invariant 
takes |s(x) — s(y)| + 1 credits in Case 1 or 2, |s(x) — s(y)| + 1/2 
credits in Case 3 or 4. 

Proof: We use induction on rank and a surprisingly complicated case 
analysis. 

Case 1—Assume without loss of generality that s(x) = s(y) and 
that if s(x) = s(y), then x is external or filled. We need half a credit 
for the work in the join and at most s(x) — s(y) + 1/2 credits to place 
on y, for a total of s(x) — s(y) + 1. 

Case 2—The split of z, if it occurs, does not affect the credit 
invariant; therefore we ignore it. Assume without loss of generality 
that s(u) = s(v). There are three subcases: 

Subcase 2a—s(w) = s(x) and join (u, v) is a Case 1 join. The credits 

originally on u and v suffice to maintain the credit invariant on u 

and uv after the join of x and y is completed. We have one credit on 

hand to join x and y; we spend half for the work in the outer call 
and half for the work in the inner call. ; 

Subcase 2b—s(w) = s(x) and join (u,-v) is not a Case 1 join. To 

perform the join of x and y we are given one credit and can obtain 

at least 2s(x) — s(u) — s(v) — 2 from u and v, for a total of 2s(x) 

— s(u) — s(v) — 1. We need half a credit for the work in the outer- 

most call and at most s(u) — s(v) + 1 for the recursive call, for a 

total of at most s(u) — s(v) + 3/2. The difference between what we 

have and what we need is 2(s(x) — s(u)) — 5/2. Since s(x) — s(u) 

> 1, we must find at most an additional half credit to spend. 

If join (u, v) is a Case 2 join, and s(x) — s(u) = 1, then either u 

or v is underfilled and yields an extra half credit. If join (u, v) is a 

Case 3 join, we save half a credit on the join. Thus, in any case we 

can obtain the needed half credit. 

Subcase 2c—s(w) < s(x). As in Subcase 2b we have at least 2s(x) 

— s(u) — s(v) — 1 credits to perform the join of x and y. We need 
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one half for the work in the outermost call, at most s(uw) — s(v) + 
1 for the recursive call, and at most s(x) — s(w) — 1/2 to place on 
w, for a total of at most s(x) + s(u) — s(v) — s(w) + 1. The 
difference between what we have and what we need is s(x) — s(u) 
+s(w) — s(u) — 2. Since s(x) — s(u) = 1 and s(w) = s(u), we have 
enough credits unless s(x) — s(u) = 1 and s(w) = s(w), in which 
case we must find an extra credit. 

Suppose s(x) — s(u) = 1 and s(w) = s(u). Since s(w) = s(w), 
join (u, v) is not a Case 1 join, and u is internal. If join (u, v) is a 
Case 2 join, then both wu and v are internal. Either both u and v are 
underfilled, giving us two additional half credits, or one of u and v 
is underfilled and w is filled, giving us an extra half credit from u or 
v and saving a half credit that does not need to go on w. The only 
other possibility is that join (u, v) is a Case 3 join, which saves us 
half a credit. 

Furthermore in this case either u is underfilled or w is filled, either 
giving us an extra half credit from u or saving us a half credit on w. 
Thus in all cases we obtain the necessary extra credit. 
Case 3—We ignore the possible split of x, which does not affect the 
credit invariant. There are two subcases: 
Subcase 3a—s(v) = s(x). To perform the join we are given s(x) — 
s(y) + 1/2 credits and can obtain at least s(x) — s(u) — 1 more 
from u, for a total of 2s(x) — s(u) — s(y) — 1/2. We need one half 
for the outermost call and at most |s(u) — s(y)| + 1 for the recursive 
call, for a total of |s(u) — s(y)| + 3/2. The difference between what 
we need and what we have is 2(s(x) — max{s(u), s(y)}) - 220. 
Subcase 3b—s(v) < s(x). To perform the join we have at least 2s(x) 
— s(u) — s(y) — 1/2 credits. We need |s(u) — s(y)| + 3/2 for the 
outermost and recursive calls, plus at most s(x) — s(v) — 1/2 to 
place on v, for a total of s(x) + |s(u) — s(y)| — s(v) + 1. The 
difference between what we have and what we need is s(x) — 
max{s(u), s(y)} + s(v) — max{s(u), s(y)} — 3/2. We have enough 
credits unless s(x) — max{s(u), s(y)} = 1 and s(v) = max{s(w), 
s(y)}, in which case we need an extra half credit. 

Suppose s(x) — max{s(u), s(y)} = 1 and s(v) = max{s(w), s(y)}. 
Then join (u, y) is not a Case 1 join. If it is a Case 2 join, then 
either u is underfilled, giving us an extra half credit, or v is filled, 
saving us a half credit on v. If it is a Case 3 join, we save half a 
credit on the join. Thus in all cases we obtain the necessary half 
credit. 

Case 4—Symmetric to Case 3. O 
It is useful to restate Theorem 4 as follows. We say a biased a, b 
tree with root x is cast to rank k if it satisfies the credit invariant and 
has k — s(x) credits on its root. Theorem 4 implies that if x and y are 
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the roots of two biased a, b trees cast to a rank k > max{s(x), s(y)}, 
then they can be joined, without using additional credits, to produce a 
tree cast to rank k. 

We can implement a split as a sequence of joins, exactly as described 
in Ref. 8. The following algorithm splits at an item / in the tree: 
Algorithm 2: split (i, r). Split the biased a, b tree with root r at item i, 
assumed to be in the tree. 

Locate the node x containing item 1. Initialize the current node to 
be p(x) and the previous node to be x. Initialize the left and right trees 
to empty; they will contain the items smaller than and larger than 1, 
respectively. Repeat the following step until the current node is null 
(the previous node is the root of the tree): 

Split Step—Delete every child of the current node to the left of the 
previous node. If there is one such child, join it to the left tree; if there 
are two or more such children, give them a common parent and join 
the resulting tree to the left tree. Repeat this process with the children 
to the right of the previous node, joining the resulting tree to the right 
tree. Remove the previous node as a child of the current node and 
destroy it if it is not x. Make the current node the new previous node 
and its parent the new current node. 


Theorem 5: The split algorithm is correct and takes O(s(r) — s(x)) 
credit time, where x is the node containing item 1. 
Proof: Correctness follows immediately from the correctness of the 
join algorithm. The time bound follows as in Ref. 8; we shall sketch 
the idea. Let cur, prev, left, and right be the current node, the previous 
node, and the roots of the left and right trees, respectively. An 
induction shows that s(prev) = max{s (left), s(right)} just before each 
split step. Another induction shows that by allocating O(s(cur) — 
s(prev)) credits to a split step, we can carry out the step while 
preserving the invariant that the left and right trees are cast to a rank 
of s(prev) + 1. That is, the amortized time associated with a single 
step of the split is proportional to the rank difference of two consec- 
utive nodes along the split path. If we sum over all split steps, then 
the sum telescopes, and we obtain the time bound in the statement of 
the theorem. a 
Splitting at an item not in the tree is just like splitting at an item 
in the tree, except that the initialization is different. Let r be the root 
of the tree, i the item at which the split is to take place, i~ and i* the 
largest item in the tree less than i and the smallest item in the tree 
greater than i, respectively, and x the handle of 1, defined to be the 
nearest common ancestor of the external nodes containing i” and i*. 
We can locate x by searching down the path from r to x if appropriate 
keys are stored in the tree (see Ref. 8). To carry out the split, we 
combine all children of x whose descendants contain items less than 1 
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to form the initial left tree and all other children of i to form the 
initial right tree. Then we initialize the previous and current nodes to 
be x and p(x), respectively, and repeat the split step until the current 
node is null. Such a split also runs in O(s(r) — s(x)) credit time. 

We can implement insert, delete, and reweight as combinations of 
splits and joins: an insertion is a two-way split followed by two joins, 
a deletion is a three-way split followed by one join, and a weight 
change is a three-way split followed by two joins. Using the same kind 
of analysis as in Ref. 8, we obtain the following credit times for these 
three operations (we have stated the bounds in terms of weights rather 


than ranks): 
insert (1, x): O(log ae + W;, =)) 


where i and i* are as defined above. 


delete (i, x): O( ios (22), 


reweight (i, w, x): 0( lo. (eatetow!)) 


min{w;, w} 





where w(x) and w; are as defined before the weight change. 


Remark: In all the time bounds derived in this section and the next 
the constant factor is proportional to b. O 


Ill. GLOBALLY BIASED a, b TREES 


Local bias is sufficient to guarantee good amortized but not good 
worst-case running times for the dictionary operations. If it is impor- 
tant that every single operation be fast, we need a stronger balance 
condition. Figure 3 shows how a split can take more actual time than 





Fig. 3—Locally biased a, b tree that cannot be split at x in actual time O(s(r) — 
s(x)). Not all children of y, z, and v are shown. Join of u and v can take an unbounded 
amount of time. 
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its credit time, and illustrates why local bias is not enough: a minor 
node that is the leftmost child of its parent has a constraint on its 
right but not on its left side, and symmetrically for a minor node that 
is the rightmost child of its parent. To overcome this problem we 
introduce globally biased a, b trees. 

If x is a node in a search tree, we define x* to be the external node 
containing the smallest item greater than the largest item in a descen- 
dant of x. Symmetrically, x” is the external node containing the largest 
item less than the smallest item in a descendant of x. If x is on the 
rightmost path of the tree, x* is undefined; if x is on the leftmost path, 
x” is undefined. A globally biased a, b tree is a search tree with two 
properties (see Fig. 4): 

1. Every internal node has at least two and at most 6 children. 

2. If x is a minor nonroot node and x* is defined, s(x*) = s(p(x)) 
— 1; if x7 is defined, s(x~) = s(p(x)) — 1. When a node x has this 
property, we say the tree is globally biased at x. 

Global bias implies local bias. Thus globally biased a, b trees have 
ideal access time. We can join two globally biased a, b trees using 
almost the same join algorithm as in Section II; the only difference is 
that the conditions determininng the cases are different. We shall call 
the algorithm in Section II local join to distinguish it from the following 
global join algorithm: 


Algorithm 3: global join (x, y). Join two globally biased a, b trees with 
roots x and y, assuming that all items in tree x precede all items in 
tree y. 

In each case, proceed as in the corresponding case of the local join 
algorithm: 

Case 0—x = null or y= null. 





(a) (b) 


Fig. 4—Two biased 2, 3 trees with the same external nodes. Numbers in nodes are 
ranks. (a) Locally biased tree. (b) Globally biased tree. 
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Case 1—s(x) = s(y) and x is external, or s(x) < s(y) and y is 
external. 

Case 2—s(x) = s(y) and both x and y are internal. 

Case 83—s(x) > s(y) and x is internal. 

Case 4— s(x) < s(y) and y is internal. 

The only difference between this and the local join algorithm is that 
if x and y have the same rank and both are internal with at least a 
children, we apply Case 2 instead of Case 1. The algorithm is identical 
to the join algorithm for globally biased 2, b trees given in Ref. 8. 


Theorem 6: The global join algorithm ts correct. 


Proof: As in the proof of Theorem 3, we use induction on rank and a 
case analysis. 

Case 1—Immediate. 

Case 2—Ignore the split of z if it takes place. Let q be the left sibling 
of u (recall that u is the rightmost child of x). Suppose q or one of the 
nodes on the rightmost path descending from gq is minor. Let this 
minor node be r. The join changes neither p(r) nor r* and thus 
preserves global bias at r. The symmetric statement holds for the right 
sibling of v (recall that v is the leftmost child of y). Suppose w (the 
join of u and v) or one of the nodes on the leftmost path descending 
from w is minor. Let this minor node be r. There must be a correspond- 
ing minor node r’ on the leftmost path descending from u, such that 
s(p(r)) <= s(p(r’)). Since the original tree is globally regular at r’, the 
new tree must be globally regular at r. The symmetric statement holds 
for the rightmost path descending from w. Thus the new tree is globally 
biased before the split. The split preserves global bias. 

Case 3—Similar to but simpler than Case 2. Case 4 is symmetric.0 


Theorem 7: The worst-case running time of the global join algorithm is 
O(max{s(x), s(y)} — max{s(u), s(v)}), where u is the rightmost exter- 
nal descendant of x and vu is the leftmost external descendant of y. 


Proof: The global join algorithm descends rank by rank concurrently 
along the rightmost path descending from x and the leftmost path 
descending from y, until reaching a leaf; then it ascends. The theorem 
follows. O 

We split a globally biased a, 6 tree in exactly the same way as a 
locally biased a, b tree, using local rather than global joins. This method 
not only produces globally biased trees, it has a worst-case time bound 
equal to the amortized bound given in Theorem 5. 


Theorem 8: Algorithm 2 (or its variant for an item not in the tree) 
correctly splits a globally biased a, b tree with root r at a node x in 
O(s(r) — s(x)) worst-case time. 


Proof: The proof is the same as the corresponding proof for globally 
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biased 2, b trees given in Ref. 8. For completeness, we sketch it here. 
Let x1, x2, «++, x, be the roots of the trees joined to form the final left 
tree. Let y; = x, and for i = 2, ---, k let y; be the root of the tree 
formed by executing join (x;, y;-1). With this definition y, is the final 
left tree. Each node x; is either a child of an ancestor of x, say a;, in 
the original tree, or the newly constructed parent of a set of children 
of such a node a;; furthermore a;+; is a proper ancestor of a; for i = 1, 

--, k—1. Consider a node x; for 1 = 2. If x; is a child of a;, global bias 
implies x; is a major child, for otherwise its right sibling is external, 
which is impossible since this right sibling has x, or its children as 
ancestors. Thus s(x;) = s(a;) — 1. If x; is the new parent of children 
of a;, then s(x;) = s(a;). It follows that s(x;) < s(xj4:) fori = 1, ---, 
k — 1. As noted in the proof of Theorem 5, an induction shows that 
s(y;) <= s(a;) fori=1, ---, k ifi = 2 and s(x;) = s(a;), x; has at most 
b — 1 children, and the join of x; and y;-; cannot split x;. Thus if 1 = 2, 
s(a;) — 1 s(x;) = s(y) S s(a). 

Consider the join of x;,; and y;. The join will descend rank by rank 
along the rightmost path from +x;4,; and the leftmost path from 4,. 
Global bias in the original tree implies that if the descent from x;+1 
encounters a minor node z (other than the root of x;+1), the leftmost 
external node of 4; will have rank at least s(p(z)) — 1 and the join will 
immediately terminate. Thus the join either terminates by reaching 
an external descendant of x;;; of rank at least s(y;), in which case the 
global bias of the joined tree is immediate, or it reaches an internal 
descendant, say z, of x;+; of rank exactly s(y;). Now we need a similar 
but more complicated statement about the leftmost path from y,. 
There are several cases. 

Case 1—s(x;) < s(a;) and x; is minor in the original tree. This can 
only happen if i = 1, i.e., x; = y;. Global bias implies that the rightmost 
external descendant of x;4; has rank at least s(a;) — 1. The join 
descends along the path from x;4, to this external node and then 
ascends. 

Case 2—s(x;) < s(a;), x; is major, and s(x;) < s(y;). In this case y; 
is also major and the leftmost paths descending from x; and ¥y;, not 
including x; and y; themselves, are identical. If z is underfilled, y; is 
external, and the join terminates in Case 1. If z is filled, y; is either 
external or filled, and the join also terminates in Case 1. 

Case 3—s(x;) < s(a;), x; is major, and s(x;) < s(y;). In this case the 
left child of y; is major and the join stops descending either at rank 
s(y;) (if y; is external or filled) or at rank s(y;) — 1 (if y is internal). 

Case 4—s(x;) = s(a;). In this case the leftmost paths descending 
from x; and y;, not including x; and y; themselves, are identical. If y; is 
external or filled, the join will stop descending at rank s(¥;). If ; is 
underfilled, the join will stop at rank s(¥;-,); either the rightmost child 
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of z or the leftmost child of y; is external of rank s(y;-1), or both are 
filled and of rank s(¥v;-1). 

In all cases the global bias of the original tree implies that the joined 
tree is globally biased. This means that the final left tree is globally 
biased. Furthermore, the time required for joining x;,,; and y; is 
O(s(aj+1) — s(a;)) in all cases. Summing over all joins gives a time 
bound of O(s(r) — s(x)) to form the final left tree. A symmetric 
argument applies to the right tree. 0 

If we implement insertion, deletion, and weight change as described 
in Section II, we obtain the following worst-case time bounds for the 
various operations on globally biased a, b trees (see Ref. 8): 


split (i, r): 0( los (ee ) 
oie (2) 


if 1 is not in the tree, where i” and i* are the items before and after i 
in the tree, respectively. 





if 1 is in the tree, or 


insert (2, x): O( los in + log, vere) 





delete (i, x): 0( loz we) + log, | 


Wj W;- + W;+ 
where W’ is the weight of the tree root after the deletion. 
w(x) m) 








reweight (1, w, x): 0( tos + log, 
Wi W 

where w(x) and w; are as defined before the weight change, and W’ is 

the weight of the tree root after the change. 

As compared with the amortized time bounds for locally biased a, b 
trees, the worst-case bounds for globally biased a, b trees are larger for 
join and delete and the same for the other operations. 

We conclude our discussion of biased a, b trees with two remarks. 
First, if b = 2a we can implement either local or global join in an 
iterative, purely top-down fashion by preemptively splitting nodes 
with b children as they are encountered. By extending this idea we 
can implement all the operations top-down. This is a reason to choose 
b = 2a over b = 2a — 1. 

Second, for appropriate large values of a and b, biased a, b trees 
offer a biased alternative to B-trees. One of the advantages of B-trees 
is that there are no underfilled nodes except tree roots; thus if nodes 
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are stored one node per page in fixed-size pages, the storage efficiency 
is at least 50 percent, not counting root pages. Biased a, b trees do not 
share this property. We leave open the problem of devising a space- 
efficient version of biased a, b trees. 


IV. PSEUDO-WEIGHT-BALANCED TREES 


In biased a, b trees, we maintain balance through a height constraint. 
However, there are other possible balance constraints, such as weight 
balance. Nievergelt and Reingold‘ defined trees of bounded balance by 
imposing upper and lower bounds on the ratio leftweight/rightweight 
at each internal node, where the left and right weights count the 
number of items in the left and right subtrees, respectively. Bent 
developed a biased version of weight-balanced trees.° However, his 
data structure suffers from a complicated seven-case join algorithm 
that needs up to three recursive calls and also uses rebalancing 
rotations more complicated than standard single and double rotations. 
In this section we introduce a form of biased weight-balanced trees 
much simpler than Bent’s. Our simplification comes from two new 
ideas: we discretize the weights and allow arbitrarily bad imbalance in 
some situations where balancing is possible. We call our trees pseudo- 
weight-balanced. 

We consider binary trees, in which each internal node x has exactly 
two children, a left child /(x) and a right child r(x). As in Section II 
we define the weight w(x) of a node x by w(x) = w; if x is an external 
node containing item 1, w(x) = w(I(x)) + w(r(x)) if x is an internal 
node. We define the rank s(x) of a node x differently: 
s(x) = Llg w(x)J. We call a binary search tree pseudo-weight-balanced 
(pwb) if it has two properties: 

1. If three nodes in a row, say x, p(x), and p(p(x)), have the same 
rank, then x is external and either x is a left child and p(x) a right 
child or vice-versa (see Fig. 5a). 

2. If x and /(x) (symmetrically x and r(x)) are internal nodes of the 
same rank, then w(r(l(x))) + w(r(x)) (symmetrically w(l(r(x))) + 
w(I(x))) is at least 2° (see Figure 5b). (This property allows us to 
do single rotations when necessary to maintain property 1.) 

The following result bounds the access time in pwh trees. 


Theorem 9: A pwb tree has ideal access time for all items. Specifically, 
if x is an external node containing item i in a tree of total weight W, 
then the depth of x is at most 2 lg( W/w;) + 3. 

Proof: Let r be the root of the tree and d the depth of x. According to 
property’1, after the first step up from x, every two steps taken along 
the path from x to r must cause a rank increase. Thus L(d — 1)/2] = 
s(r) — s(x) = Llg WJ — Llg w(x)J, which implies d <= 2L(d — 1)/2J +1 
<= 2(lg W -— lg w(x) +1) +1 21g (W/w(x)) +3. LC] 
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Fig. 5—Legal configurations in a pseudo-weight-balanced tree. (a) Three nodes of 
the same rank in a row. (b) Two internal nodes of the same rank in a row. 


We join two pwh trees using the following algorithm: 


Algorithm 4: join (x, y). Join two pwh trees with roots x and y, assuming 
that all items in tree x precede all items in tree y. 

Case 0—x = null or y = null. Return y if x = null or x if y= null. 

Case 1—lg(w(x) + w(y)) = 1 + max{s(x), s(y)} or the heavier of x 
and y is an external node. Return a new root with left child x and 
right child y. 

Case 2—s(x) > s(y) and lg(w(x) + w(y)) < 1+ s(x) and x is not 
external. If r(x) is external, or internal of rank at most s(x) — 1, 
replace r(x) by join (r(x), y). Otherwise, perform a single left rotation 
at x and replace the right child r(u) of the new root u by join (r(u), 
y) (see Fig. 6). 

Case 3—s(x) < s(y) and lg(w(x) + w(y)) < 1+ s(y). Symmetric 
to Case 2. 

Remark: Although the join algorithm is stated recursively, it is easy to 
implement it in an iterative, purely top-down fashion, since the rank 
of a node depends only on the total weight of its descendants and not 
on their arrangement. 

Theorem 10: Algorithm 4 produces a pwb tree of rank max{s(x), s(y)} 
or 1+ max{s(x), s(y)}. 

Proof: By induction on the depth of the recursion. The definition of 
rank implies that the rank of the new tree is max{s(x), s(y)} or 1 + 
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i =. JOIN (u,y) 


/» => JOIN (vy) 


(b) 


Fig. 6—Case 2 of the join algorithm for pwh trees. (a) Node u = r(x) external or s(u) 
< s(x): no rotation. (b) Node u internal and s(u) = s(x): rotation. 


max{s(x), s(y)}. In the latter case, the join must have executed Case 
1 and both children of the new root must have rank smaller than the 
root’s rank. Case 1 obviously constructs a tree with properties 1 and 
2. In Case 2, property 2 guarantees that the single rotation, if it occurs, 
creates a pwh tree. If the tree produced by the recursive join has rank 
less than s(x), the overall joined tree clearly has properties 1 and 2. 
This is also true if the tree produced by the recursive join has rank 
s(x), by the observation above. 0 

In analyzing the running time of Algorithm 4, we use the following 

credit invariant: Any node x contains max{0, s(p(x)) — s(x) — 1} 
credits. 
Theorem 11: Algorithm 4 runs in O(|s(x) — s(y)|) = O(lg (w(x) + 
w(y))/min{w(x), w(y)}) amortized time. Specifically, if we assume 
without loss of generality that s(x) = s(y), performing the join while 
maintaining the credit invariant takes at most s(x) — s(y) + 1 credits. 
Proof: We consider the same cases as in the algorithm. 

Case 1—We need one credit to build the new tree and either s(x) — 
s(y) or s(x) — s(y) — 1 to establish the invariant on y, for a total of 
at most s(x) — s(y) +1. 

Case 2—Suppose the single rotation does not take place. We have 
on hand s(x) — s(y) + 1 tokens for performing the join plus s(x) — 
s(r(x)) — 1 from r(x), for a total of 2s(x) — s(y) — s(r(x)). We need 
one token for performing the outermost call of join plus max{s(r(x)), 
s(y)} — min{s(r(x)), s(y)} + 1 for the recursive call plus s(x) — 
max{s(r(x)), s(yv)} — 1 to establish the invariant on the root of the 
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tree returned by the recursive call, for a total of s(x) — min{s(r(x)), 
s(y)} +1 S 2s(x) — s(y) — s(r(x)), since s(x) > max(s(r(x)), s(y)}. 
Exactly the same argument applies if the rotation does take place, 
since the rotation preserves the credit invariant. CO 

The algorithm for splitting a pwh tree is almost identical to but 
simpler than the algorithm for splitting a biased a, b tree. 

Algorithm 6: split (x, r). Split a pwb tree with root r at a node x. 

Initialize the current node cur, the previous node prev, and the left 
and right nodes left and right to be p(x), x, I(x), and r(x), respectively. 
Repeat the following step until cur = null: 

Split step—If prev = I(cur), replace right by join (right, r(cur)); 
otherwise, replace left by join (1(cur), left). Remove prev as a child of 
cur and destroy it if it is not x. Replace prev and cur by cur and p(cur), 
respectively. 

This algorithm is the same as that described by Bent, Sleator, and 
Tarjan? for splitting biased binary trees, and indeed will work for any 
class of binary search trees for which a join algorithm is known. 


Theorem 12: The amortized time of split (x, r) is 


ou (Sc) 


More precisely, performing the split while maintaining the credit 
invariant takes O(s(r) — s(x)) credits, where x is the node containing 
item 1. 

Proof: The definition of ranks implies that s(prev) = max{s (left), 
s(right)} before each split step. An easy induction shows that if we 
allocate O(s(cur) — s(prev) + 1) credits to each split step, we can 
carry out the step while maintaining the credit invariant on the trees 
left and right and in addition maintaining 2s(prev) — s(left) — s(right) 
credits on hand. Summing over all split steps and using property 2 
gives the theorem. 

Using the appropriate combinations of join and split, we obtain the 
same amortized time bounds as in Section II (with binary logarithms) 
for insertion, deletion, and weight change in pwh trees. Pseudo-weight- 
balanced trees are a very simple version of locally biased trees, com- 
petitive with the biased binary trees presented in Ref. 8. We have been 
unable to devise a globally biased version of pwh trees and leave this 
as an open problem. 
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In this paper we present a theory of relational database systems based on 
the partition lattice, which represents a new mathematical approach to the 
structure of relational database systems. A partition lattice can be defined for 
any given relation. This partition lattice is shown to be a meet-morphic image 
of the Boolean algebra of subsets of the attribute set. The partial ordering in 
the lattice is proved to be equivalent to the concept of functional dependency, 
and thus Armstrong’s axioms for functional dependencies are proved. We 
solve the problem of finding the list of all keys by seeking the prime implicants 
of the Boolean function associated with the principal ideals generated by the 
attributes. We demonstrate the properties of the Boyce-Codd Normal Form 
(BCNF), and give a modified algorithm for synthesizing an information- 
lossless BCNF based on the principal filter. The necessary and sufficient 
conditions for multivalued dependency (MVD) are given in terms of a lattice 
equation, and the inference rules of MVD are proved. The necessary and 
sufficient conditions for join dependency (JD) are given; consequently, we can 
prove the known result that acyclic join dependency (AJD) is equivalent to a 
set of MVDs. The concept of data independence is introduced, and is extended 
to conditional independence and mutual independence. We established this 
algebraic theory of relational databases in the same spirit that the theory of 
probability was constructed. We present a comparison that demonstrates the 
similarities. 


Il. INTRODUCTION 


The existing theory of relational databases is based on Codd’s 
relational model of data.’® This relational database theory can be 
considered to be the study of data dependencies (or independencies). 
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The theory was initiated by Codd with the introduction of the concept 
of functional dependency; Codd observed that this concept can be used 
to design better, normalized, database schemes. The advantage of 
normalized database schemes is that they remove the possibility of 
updating anomalies caused by undesirable data dependencies.” ° 

In the existing theory of logical database design, functional depend- 
encies are input constraints that must always hold in the relation.® In 
the present paper, however, we take a different approach. We assume 
that for a particular database designer, there exists a (finite) universal 
relation R[Q] for a given set of attributes 2, such that any relation T 
on Q is a subset of R[Q]. Furthermore, each subset X of 2 corresponds 
to an equivalence relation (partition) on the set of tuples of R[Q]. 
That is, if two tuples in R[Q] have the same X value, then they are in 
the same equivalence class. With this approach, the concept of func- 
tional dependency becomes equivalent to the refinement partial or- 
dering of the partition lattice. The partitions on the (finite) set of 
tuples of the universal relation R[Q] can then be considered as the 
fundamental constraints, from which the functional dependencies 
(partial ordering) can be derived. Consequently, with our approach, 
the functional dependencies are inherent properties of the universal 
relation R[Q]. The input constraints of course must be consistent with 
the inherent properties within the database. 

Another kind of data dependency, proposed by Fagin’ and Zaniolo,® 
is the multivalued dependency, which includes functional dependency 
as a special case. Multivalued dependency is the necessary and suffi- 
cient condition for the lossless-join decomposition of a relation into 
two subrelations, such that the original relation can be regenerated by 
the (natural) join operation.”!! Using the partition lattice we propose, 
we can formulate multivalued dependency as a lattice equation (see 
Section VI). We show that the axioms for functional dependencies” 
and the inference rules of multivalued dependencies” can all be proved 
as theorems within the framework of partition lattice theory. We show 
how the concept of join dependency’”""* is connected to multivalued 
dependency. We give the necessary and sufficient condition for join 
dependency and, consequently, we can prove the known result that 
the acyclic join dependency is equivalent to a set of MVDs.**"® We 
also introduce the concept of data independence, and the extension to 
conditional independence and mutual independence of sets of attri- 
butes. 

The problem of listing all the keys of a relation is solved by using 
the concept of principal ideals in lattice theory. One form of a relation 
having desirable properties is the Boyce-Codd Normal Form (BCNF); 
we show that the concept of the principal filter (dual ideal) can be 
used to produce information-lossless Boyce-Codd Normal Forms. 
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Both the theoretical foundation and the practical application of the 
existing theory of relational databases appear to be fragmented. This 
paper shows that all the diverse kinds of data dependencies can be 
formulated within the lattice theory, which has the important advan- 
tage of unifying the theory of relational databases into a coordinated 
whole. Because of this, it would appear that future work in relational 
databases should be conducted using lattice theory as the basic frame- 
work. 

The establishment of this algebraic theory of relational databases is 
done in the same spirit as the construction of the theory of probability, 
although probability theory is of course unrelated to database theory. 
We are convinced that the lattice theory could play a role in the theory 
of relational databases similar to the role that measure theory plays 
in the theory of probability.” 

The basic notion of relational databases is defined in Section II, 
and the partition lattice of the relation is introduced in Section III. 
The problem of listing all keys is solved in Section IV, where the 
Boolean functions associated with the principal ideals are defined. 
The properties of the Boyce-Codd Normal Form are studied in Section 
V, where we present a modified algorithm for synthesizing informa- 
tion-lossless BCNFs based on the principal filters. Section VI is 
devoted to the proof of equivalence between multivalued dependency 
and a lattice equation. Section VII discusses join dependency and 
acyclic join dependency. Finally, in Section VIII we outline a possible 
direction for future research, as well as a comparison that shows the 
similarities between probability theory and the algebraic theory of 
relational databases. In Appendix A we list the laws of lattice theory 
for reference. The proofs of the axioms for functional and multivalued 
dependencies are listed in Appendix B. 

Unless otherwise stated, we refer to the universal relation as simply 
“the relation” in the remainder of this paper. 


Il. RELATIONS 


An attribute is a symbol taken from a finite set 2 = {Aj, Ao, ---, 
A,}. For each attribute A there is a set of possible values called its 
domain, denoted DOM(A). We will use capital letters from the begin- 
ning of the alphabet (A, B, - - -) for single attributes, and capital letters 
from the end of the alphabet (X, Y, ---) for sets of attributes. For a 
set of attributes X C Q, an X-value x is an assignment of values to the 
attributes of X = {Aj;,, As,, --- , ai,{ from their respective domains. 
The notation XY will be used to represent the union of two arbitrary 
sets of attributes X, YC Q. 

A relation R on the set of attributes Q = {A,, --- , An} is a subset of 
the Cartesian product DOM(A:) X --- X DOM(A,). The elements 
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(rows) of R are called tuples. A relation R on {A,, ---, An} will be 
denoted by R[A; --- A,]. Similarly, if R is defined on the union of sets 
(Xi, Xo, --+, Xm), then the notation R[X, --- X,,] will be used. A 
relation can be visualized as a table whose columns are labeled with 
attributes and whose rows depict tuples. The ordering of the rows and 
columns is immaterial. The cardinal of R is the total number of tuples 
in R and is denoted by |R|. 

Let ¢ be a tuple in R[Q]. For X C€ Q, tLX] denotes the tuple that 
contains the components of t corresponding to the attributes of X. 
The projection of R on X, denoted by R[X], is defined as follows: 


R[X] = {t[X]|t © R}. 


Similarly, the conditional projection of R on X by a Y-value y, where 
Y CQ, is defined as follows: 


R,[X] = {t{X]|t ER and ¢t[Y]=y}. 


Let R[XZ] and S[XZ] be relations where X, Y, and Z, are disjoint 
sets of attributes. The join (natural join) of R and S, denoted by 
R|xX|S, is the relation T|XYZ] whose attributes are XYZ, and is 
defined as follows: 


T[XYZ] = R[XZ]| x| S[YZ] 
= {(x, y, z)|(x,z) ER and (y, z) € S}. 


The join can also be defined as the union of a collection of Cartesian 
products: 


TIXYZ] = R[XZ]|X|S[YZ] 
= {R[X] x SY] x (2) | (2) € RIZ] N S[Z}}. 


Let R be a relation on the set of attributes 2. We may have two sets 
of attributes X, Y, C 9, such that for any two tuples th, te € R, ti[X] 
= to[X] implies t,[Y] = t.[Y]. We say then that X functionally deter- 
mines Y in R, and denote this fact by X — Y. A functional dependency 
(FD) X > Y is trivial, meaning it holds in all relations, if YC X. Note 
that FDs enjoy the projectivity and inverse projectivity properties.*” 
For sets X, YC 2’ CQ, the FD: X —> Y is valid in R[Q] iff it is valid 
in R[Q’). 

We say that a set of relations {R[Q,], --- , R[Q,]} has the information- 
lossless join property if 9 = Q, --- Q, and 


RQ] = R[Q]| x] --- |X| [Qn]. 


If the set {R[Q,], --- , R[Q,]} does not have this property, we say that 
it has a lossy join.'* An important property of functional dependency” 
is that if FD: X — Y is valid in R[Q] then 
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R[Q] = RQ — Y)X]| x | REXY]. 


This property will be discussed in more detail in Section VI. 


Il. THE RELATION LATTICE 


If S is a nonempty set, then a subset p of S X S is called a binary 
relation on S. The product of two binary relations p, p’ € S X S is 
defined as: 


p°p’ = {(a, b} € S X S| dc ES such that (a, c) € p, (c, b) € p’}. 


We say that a relation p on S is reflexive if (a, a) € p for every ain S; 
that p is symmetric if p~' = p, ie., if 


(Va,bES), (a,b) Ep implies (b, a) € p; 
and that p is transitive if p ° p C p, Le., if 
(Va, b,c ES), (a,b)Ep and (b,c)Ep imply (a,c) Ep. 


A binary relation is called an equivalence relation if it is reflexive, 
symmetric, and transitive. 

A family z = {B;| i © I} of subsets, called blocks of S, is said to form 
a partition of S if the following conditions hold: 


1. Each B; is nonempty 
2. For alli #jin/,B;N Bj=O 
3. U{B;|i € 1} = 8S. 


The two apparently different notions of “equivalence relation” and 
“partition” are interchangeable: Let p be an equivalence relation on a 
set S. Then the family a, = {b| (a, b) € p} of subsets of S is a partition 
of S. Conversely, if 7 = {B;|i € I} is a partition of S, then the relation 
{(a, b) | (Bi € J), (a, b) € B;} is an equivalence relation on S. 

If p is an equivalence relation (partition) on S, we shall sometimes 
write apb as an alternative to (a, b) € p. The sets a, that form the 
associated partition of the equivalence relation are called p-classes. 
The set of p-classes is called the quotient set of S by p and is denoted 
by S/p. 

A binary relation S on the set S is a partial ordering of S if and only 
if S is reflexive; antisymmetric, i.e., if 


(Va,beES), ab and bSa imply a=b; 


and transitive. A set S with a partial ordering & is called a partially 
ordered set (poset) and it is denoted by the pair (S, S). 

Let (S, S) be a poset and let T be a subset of S. Then, a € S is the 
greatest lower bound (g.l.b.) of T iff 

1. (WtET),aSt. 
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2. (WtE T), a’ St implies a’ Sa. 

Similarly, a € S is the least upper bound (1.u.b.) of T iff 

1. (WiE T),t Sa. 

2. (VtE€ T),t Sa’ impliesa Sa’. 

A lattice is a poset in which any two elements a and b have a g.1.b., 
called a meet and denoted by a-b, and a 1.u.b., called a join and denoted 
by a + b. We sometimes write the meet a-b as ab if no confusion is 
created. The properties of the meet and join operations of a lattice’® 
are listed in Appendix A. 

Let the set of all partitions z; on S be denoted by J](S), and define 
the partial ordering on [](S) as follows: 


If (Va,b ES), azib implies a7zb, then 7, = 7. 


The poset ([][(S), S) is seen to be a lattice ([[(S), -, +) with a 
universal lower bound 0 = {B;|i € I} such that every block B; is a 
singleton, and an universal upper bound 1 = {S}. To specify a partic- 
ular partition, we list the elements, and distinguish blocks with bars 
and semicolons. For example, if S = {1, 2, 3, 4, 5} and partition z on 
S has blocks {1, 3, 4}, {2, 5}, then we write z = {1, 3, 4; 2, 5}. The meet 
and join of any two partitions 7, 72 € [[(S) can be determined as 
follows: 

1. (Va, b € S), ax,- 2b iff amb and azyb. 

2. (Va, b € S), am, + rob iff dn € N and cp, --- , ¢, © S such that 
a= Co, b = Cy, and C7 1C;41 OF CiToCj4, for each 1,0 S<i<n-—1. 

A complemented distributive lattice is called a Boolean algebra (see 
Appendix A). The set of all subsets of S, called the power set of S, and 
denoted by 2°, with the partial ordering (VS1, Sy € 2°), S, S S» iff S; 
> S», is a Boolean algebra (2°, -, +, ~) with the universal bounds 0 = 
S and 1 = ©. The dual of a poset is the poset with the converse partial 
ordering on the same elements. The Boolean algebra defined above is 
the dual of the conventional Boolean algebra of the power set. The 
operations of meet and join are defined by 

1. Meet (g.1.b.) S1-S_ = S; U So, 

2. Join (l.u.b.) Sy + So = Si M So, 
and the complement of S; € 2° is S; = S — S}. 

Let y: L — M be a function from a lattice L into a lattice M. We 
say y is a meet-morphism if 


(Va,bEL), Y(a-b) = Y(a)-¥(5), 
and y is a join-morphism if 
(Va,bEL), Ylat+b)=yY(a) + ¥(d). 


Meet-morphisms and join-morphisms are both isotone (order-preserv- 
ing); i.e., 
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(Va,b€ L), a=b implies y(a) S y(d), 


and any order-preserving one-to-one mapping with an inverse is an 
isomorphism."® 

Let R be a relation on the set of attributes 2. The set of all subsets 
of 2, denoted 2°, with the partial ordering defined by set-containment, 
is a Boolean algebra (2°, -, +, ~),’® where the meet, join, and comple- 
ment operations are defined as above. For every X € 2°, there is an 
equivalence relation (partition) on the set of tuples in R[Q] defined as 
follows: 
Definition 1: Let R be a relation on the set of attributes 2. Each subset 
of 2 is associated with a partition of the set of tuples of R. We define 
the function 6: 2° — [][(R[Q]), which we call the partition function 
(associated with R[Q]), by 


0:X — 0(X) = {(t, te) © R[Q] X R[Q]| LX] = to[X]}. 


In general, the image set Jm(6) of 6 is not a sublattice of [](R[Q]). 
Since 71, 72 € Im(6) implies 71-72 € Im(@), Im(6) is a complete lattice 
in its own right,”° and it will be called the relation lattice of R[Q], and 
denoted by L(R[Q]). Note that there are no duplicated tuples in R[Q], 
so that 6(Q) = 0. Since the tuples cannot be “differentiated” by the 
empty set of attributes, we define 6(@) = 1. The universal bounds of 
L(R[Q]) are the same as those in [][R)Q]). We immediately recognize 
the concept of functional dependency to be equivalent to the refine- 
ment partial ordering of the partitions. 

Lemma 1: Let R[{Q] be a relation on the set of attributes Q, and let 6: 2° 
— [[(R[Q]) be the partition function associated with R[Q], defined above. 
Then 


X—Y iff 0(X) S0@(Y). 

An immediate consequence of the above lemma is that the projection 
R[X] of R[Q] on X is simply the quotient of R[Q] by @(X), i.e., RLX] = 
R[Q]/6(X). Thus each tuple in R[X] corresponds to a 6(x)-class in 
R[Q]/6(X) and it takes the X-value only. Note that 6(X) = 0(Y) does 
not imply R[LX] = R[Y] because the attributes X and Y may have 
different sets of values. 


Theorem 1: Let R be a relation on the set of attributes Q, and let L(R[Q]) 
be the relation lattice of R. Then the partition function 6:2° — L(R[Q]) 
is a meet-morphism. 


Proof: We want to show that 
OXY) =OX)(Y), WX, Ye 2% 
Suppose ¢,0(XY)t.. Then, t;| XY] = t.[XY], which implies 
ti[X] =t.[X] and 4[Y] =24[Y]. 
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Hence, 
t0(X)t. and ,0(Y)t2. 
By the definition of the meet operation, we have 
t,0(X)0(Y )to, 


so that 
OXY) SA(X)0(Y). 


Suppose ¢,0(X)0(Y)ts. Then, 
t:0(X )te and t,0(Y )to, 


so that 
ti[X]=t[X] and 4[Y]=¢t[Y], 
and thus 
ti[XY] = t.[ XY]. 
Consequently, 
tO(XY )te, 
so that 
A X)O(Y) S OXY). 
Hence 


OXY) =6(X)0(Y). 


Note that the partition function @ is order-preserving, but it is in 
general not a join-morphism.* However, if 6(X + Y) = 0(X) + @(Y) 
holds in L[R], the pair (X, Y) has a special property in the relation. 
This is discussed further in Section VI. 

It is clear now that Armstrong’s axioms for functional dependencies 
become theorems within the framework of lattice theory. The proofs 
of the axioms for functional dependencies are given in Appendix B. 

Let R be a relation on the set of attributes Q, and let 6: 2° > L[R(Q)] 
be the partition function associated with R[Q]. Then the relation 
6° 6 on 2° defined by 


0°67 = {(X, Y) € 2° x 2°|6(X) = A(Y)} 


is obviously an equivalence relation. Sets in the quotient set 2°/6 ° 97! 
will be called 6 classes. 


* The join of a, and m2 in L(R[Q]) may be different from their join in [](R[Q]). We 
will use the notation 7, ® z» to denote the join of 7, and z2 in [](R[Q]), while x; + z2 
will denote the join of 7, and z, in L(R[Q)); e.g., in Example 1 below, 0(E) ® 0(S) = {1, 
2, 3, 4, 5, 6; 7, 8}, and 6(E) + 6(S) = {1, 2, 3, 4, 5, 6, 7, 8}. 
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Table I—Relation R[ECSY] 


Employee Child Salary Year 
1 Hilbert Hubert $35K 1975 
2 Hilbert Hubert $40K 1976 
3 Gauss Gwendolyn $40K 1975 
4 Gauss Gwendolyn $50K 1976 
5 Gauss Greta $40K 1975 
6 Gauss Greta $50K 1976 
7 Pythagoras Peter $15K 1975 
8 Pythagoras Peter $20K 1976 


Example 1: Consider the relation R in Table I (see Ref. 7). Let 2 = 
{E, C, S, Y} be the set of attributes, where E = employee, C = child, 
S = salary, Y = year. Then 


2° = {@, E, C, S, Y, EC, ES, EY, CS, CY, SY, ECS, 
ECY, ESY, CYS, CESY}, 
and 
AD = {1, 2, 3, 4, 5, 6, 7, 8} = 1, 
6(E) = {1, 2; 3, 4, 5, 6; 7, 8} = m, 
6(C) = 0(EC) = {1, 2; 8, 4; 5, 6; 7, 8) = m2, 

6(S) = {1; 2, 3, 5; 4, 6; 7; 8} = ms, 
6(Y) = {1, 3, 5, 7; 2, 4, 6, 8} = m4, 
O(ES) = O(EY) = O(SY) = O(ESY) 
= {1; 2; 3, 5; 4, 6; 7; 8} = as, 

6(CS) = O(CY) = O(ECY) = 6(ECS) = a(CSY) = O(ECSY) 
= {1; 2; 3; 4; 5; 6; 7; 8} = 0. 


The Hasse diagram’®”! of the relation lattice is illustrated in Fig.1. O 


IV. LIST OF KEYS 


Let R be a relation on the set of attributes 2. We say that X € 2 is 
a superkey of R if X > A, VA € Q. If X is a superkey and no proper 
subset of X is a superkey, X is said to be a key of R.*”° 


Lemma 2: X € Q is a superkey of R iff 6(X) = 0. 
Proof: (Necessity) Let Q = {A,, --- , A,}, and X — A;, VA; € Q. Then, 


By the definition of the meet operation, we have 
O(X) S 6(Ai)0(Az) --- O(An). 
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Fig. 1—Relation lattice L(R[Q]). 


It follows from Theorem 1 that 
6(X) S O(AyAe --+ An) = 0(Q) = O. 
Hence, 
O(X) = 0. 
(Sufficiency) Suppose 6(X) = 0. Then 

0X) S0(A;)), WA, EQ. 
Hence, 

X—->A;, VA;EQ. Hf 


An ideal is a subset J of a lattice L with the properties'® 

1.a€d,x EL, and x Sa, imply x € Jd, 

2. a,b€J impliesa+bEd. 
For every a € L, the subset of all elements “less than or equal to” a is 
evidently an ideal; it is called the principal ideal of L generated by a, 
and is denoted by (a], i.e., 


(a] = {x € L|x S ah. 
Definition 2: Let R be a relation on the set of attributes 2 = {Aj, 
-, A,}. For each A; € Q, J; = (0(A;)] is the principal ideal of the 
relation lattice L(R[Q]) generated by 6(A;). A Boolean function 
f(Ai, +++, An)= YY X 


6(X)Ed; 


defined on 2° is the Boolean sum of all X € 2° such that 6(X) S 6(A)). 
We will call f; the principal ideal function (generated by A;). @ 

This function plays a role similar to the Boolean function used in 
Ref. 16. 
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Theorem 2: Let R be a relation on Q = {A,, ---, An}. X CO is a 
superkey of R iff X is a product term in the expansion of the Boolean 
function 


F(A, -++, An) = I] fi(A1, --+, An), 
i=1 


where f; is the principal ideal function generated by A,. 
Proof: The Boolean function F(A), --- , A,) has the expansion 


F(A, ona (fs > X1 +++ Xp 
i=1 


0(X)Ed; 
We want to show that every term K = X, --- X, is a superkey. Since 
6(X;) € J; = (0(A;)], it follows that 
6(X;) = 6(A)), 1l<isn. 
From L6 in Appendix A, we have 
6(X1)O( Xs) - ++ O(X,) S O(A1)O(Az) «++ O(A,). 
It follows from Theorem 1 that 
O0(X X_ ++» X,) S OA «++ An) = 0(Q) = O,7 
and thus 
6(X,X_ +++ X,) = 0. 
Hence, K = X,X2 --- X, is a superkey of R. 
Conversely, suppose X is a superkey of R. Then 
XxX —> Ai, VA, E 2. 
Thus, 
6(X) S 6(Ai), 1<i<n. 
By the definition of the principal ideal J;, we must have 
0(X) EJ; = (0(AD], 1<i<n. 


It follows that X = X -.-- X (n times) is a product term in the 
expansion of F(A), --- , An). 

It is natural to call F(Ai, ---, An) the key Boolean function of the 
relation R[A, --- A,]. Since any key X is a superkey of R, X must be 
a product term of the key Boolean function F(A), --- , An). Since no 
proper subset of X is a superkey, then by the definition of the prime 
implicant of a Boolean function,” we have 
Corollary 1: Let R be a relation on the set of attributes Q = {Aj, ---, 
A,}. X € Q ts a key of R iff X is a prime implicant of the key Boolean 
function F(A,,---,A,). Hf 
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An attribute A € Q is prime in R[Q] if A is in any key of R; otherwise 
A is nonprime. A € Q is a nonprime attribute if and only if the key 
Boolean function is independent of A. 


Theorem 3: Let R be a relation on Q. A € Q is a nonprime attribute iff 
there exists X € Q such that 

1AG€X, XA, 

2. AZ — X implies Z > X. 


Proof: (Necessity) Let A € 2 be a nonprime attribute, and let X be 
any key of R. Then 


A€xX, and X—>A. 


Suppose AZ — X. Then 0(AZ) S 0(X) = 0. It follows that 6(AZ) = 0 
and thus AZ is a superkey; it contains a key K C AZ and A ¢ K. We 
have K C Z, so that 


0Z) S 0(K) = 0 = A(X). 
Hence, 
Z—X. 


(Sufficiency) Let Q = {Aj, --- , An}, n = 2. Assume there is an X = 
Ag, +++, Am, such that (1) X — A, and (2) AiZ — X, implies Z —> X. 
We want to show that A; must be a nonprime attribute. The key 
Boolean function F(A), --- , An) of R[Q] can be written in the form 


F(Ay, «++, An) = I f= fb aes, eee 2 
— (fifxfm+i) roe (fifxfn), 


where fx = fo +--+ fm. For any product term Y in fx we have 
6(Y) S O(X) S 6(A)). 
Therefore, Y must be a term in f,. It follows that f, has the form 
h=fx+8 


for some Boolean function g. Since 6(A;Z) S 6(X) implies 0(Z) S 0(X), 
fx can be written in the form 


fx=Aht+h+p=h+p 


for some Boolean functions h and p which are independent of A. Also, 
every fj, j] =m-+1, ---,n,can be written in the form 


fi=fietfxe+q 


for some Boolean functions e and q, which are independent of A). It 
follows that 
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Aifxf = fifx(fie + fxe + q) 
= fx(fie + fifxe + fig) 
= fx(fie + fig) 
= fifxle + q) = (fx + g)fxle + q) 
= fx(e + q) = (h + p)(e + q). 
Since h, p, e, and q are all independent of A;, we know that fi fx/; is 


independent of A; for allj = m+1,--- ,n. Clearly, no prime implicant 
of F(A;, ---, An) contains Aj, and therefore A, is a nonprime attri- 
bute. O 


Example 2: Consider the relation R in Example 1. To obtain the prime 
implicants of the key Boolean function F, we can first simplify each 
principal ideal function. The principal ideal functions of the relation 
R[ECSY] are 


fe=E+C+ SY, 
fo = C, 
s=S+ EY + CY, 
fy = Y+ES + CE, 
and the key Boolean function is 
F(E, C, S, Y) = (F + C+ SY)-C-.(S + EY + CY) 
-(Y + ES + CE) 
= CS + CY. 


The sets CS and CY are the keys, and E is the only nonprime 
attribute. O 


V. BOYCE-CODD NORMAL FORM 


Normalization is a logical database design process that can be viewed 
as the decomposition of a relation into a set of subrelations, such that 
the original relation can be regenerated by the joins of the subrelations. 
The purpose of decomposition is to separate the independent compo- 
nents into distinct relations, to avoid updating anomalies.” It is claimed 
in Ref. 4 that the Boyce-Codd Normal Form is one that is free of 
insertion and deletion anomalies. This section is devoted to the BCNF 
and its relation lattice. A modified algorithm for synthesizing an 
information-lossless BCNF® is included, based on the concept of the 
principal filter of the relation lattice. 

Recall that a functional dependency X — Y is trivial if YC X. A 
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relation R[Q] is said to be in Boyce-Codd Normal Form if, for all 
nontrivial FDs X — Y, X is a superkey.”* 
Definition 3: Relation R[Q] is in BCNF if X — Y implies either 
1. X is a superkey, i.e., 0(X) = 0, 
or 


2,YCX. & 


If a relation is in BCNF, we will show that its relation lattice has 
some special properties. To analyze these properties we need the 
concept of the principal filter.’® 

An ideal of the dual of the lattice L is called a filter of L. A subset 
M of Lis a filter of L if 

1. aE@ M,x EL, and x Za, imply x € M, 

2. a, b, € M implies a-b € M. 

For every a € L, the subset of all elements “greater than or equal to” 
a is a filter; it is called the principal filter of L generated by a, and is 
denoted by [a), i.e., 


[a) = {x © L|x 2 ah. 


If a and b are elements of a lattice L, where a < b, and there is no c 
€ L such that a < c < b, then we say that a is covered by b (or b covers 
a).'® An element that covers the universal lower bound 0 of L is 
referred to as an atom of L."® 
Definition 4: Let R be a relation on the set of attributes 0, and let z 
be an atom of the relation lattice L(R[Q]). Let 0, = {A|A € Q, (A) 2 
ax} C Q. Then the projection R[Q,] of R and , is called an atomic 
projection, and [7) is called an atomic filter. 

It is easy to verify that the relation lattice of the atomic projection 
R[Q,,] is isomorphic to the principal filter [7) of L(R[Q]) generated by 
TT. 

Definition 5: Let R be a relation on the set of attributes 2, and let z 
€ L(R[Q]) be an atom. The principal filter [7) of L(R[Q]) is called 
normal iff whenever X — Y is valid in the atomic projection R[Q,] 
then Y CX; otherwise, it is called abnormal. 

Lemma 3: A relation R[Q] is in BCNF iff every atomic filter of L(R[Q]) 
is normal. 

Proof: (Necessity) Trivial. 

(Sufficiency) Suppose X — Y and X is not a superkey, i.e., 0(X) # 
0. Then there must exist an atom 7, such that 


O<7 5S AX) S Ay). 


It follows that X, Y C 0, and that X — Y is valid in the atomic 
projection R[Q,], which is assumed normal. Therefore YC X. 
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The join operation in the Boolean algebra (2°, -, +, ~) is not always 
preserved by the 6 mapping. But for a relation R[Q] in BCNF, if X, Y 
C Q and neither X nor Y is a superkey of R[Q], then the join X + Y 
is preserved by 6. We have 
Corollary 2: If R[Q] is in BCNF, X, Y € Q, 0(X) # 0, and 6(Y) # 0, 
then 


O(X + Y) = 0(X) + ACY). 
Proof: Since X + Y C X and X + Y C Y, we have 
OX) SaX+ Y) and AY) SAX + Y). 
By definition of the join operation, we have 
0(X) + OY) S OX + Y). 
Suppose there is a Z € 2 such that 
AX) =A(Z) and #Y) S AZ). 
Given 6(X) # 0 and 6(Y) # 0, we have 
ZOX and ZCY. 
Thus, 
ZOX+Y, 
so that 
0(X + Y) = A(Z). 
By the definition of least upper bound, we have 
0X + Y) =X) +A(Y). O 


The most important characteristic of the BCNF is given in the 
following theorem. 


Theorem 4: The relation R[Q] is in BCNF iff every atomic filter [x) of 
L(R[Q]) is isomorphic to the Boolean algebra (2°, -, +, ~). 

Proof: (Necessity) Since [7) is a meet-morphic image of 6 restricted to 
2°, it is sufficient to show that @ is a one-to-one mapping on 2°. Let 
X, Y € 2°, and 6(X) = 0(Y). It follows that 


O(X) = OY — X)0(Y + X) SHY - X), 
which implies 
X—>Y-X. 
Since 0 < x S 0(X) and [7) is normal, we have 


Y=XCX: 
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Hence, 

YC X. 
Similarly, 

XC Y. 


Therefore, X = Y, and @ is a one-to-one mapping on 2*. 

(Sufficiency) Suppose X — Y is valid in R[Q,]. Then 6(X) S 0(Y). 
Since the inverse of an isomorphism is also order-preserving, it follows 
that X D Y. Therefore, [7) is normal and R[Q] isin BCNF. @f 

The above theorem implies that if [7) is normal, the only key of 
R[Q,] is 0 (2) = Q,. 

It is known that any relation has a lossless-join decomposition into 
Boyce-Codd Normal Form, and an algorithm for determining the 
decomposition is given in Ref. 6. We will show how the concept of the 
principal filter can be used to modify this algorithm. In the algorithm 
for synthesizing the Third Normal Form,° a concept similar to the 
principal filter is used implicitly by Bernstein when he partitions the 
functional dependencies (Step 2). Before describing the improved 
algorithm, we need the following: 


Lemma 4: Let R be a relation on Q. Let 7 € L(R[Q]) be an atom of the 
relation lattice, and let K be a key of the atomic projection R[Q,]. Then, 


R[Q] = R[(Q — 2,)KIX|R[O,]. 


Proof: K € Q, and K > Q,. 
The algorithm for determining the lossless-join decomposition into 
BCNF is simply to construct a sequence of decompositions D; = (Ri, 
-, Rn) of R, each with lossless join: Initially, let Do consist of R 
alone. If T[Q] is a relation in D;, and T[Q] is not in BCNF, let a be an 
atom of L(7[Q]) for which the principal filter [7) is abnormal. Let K 


Table [I—Relation RI MSPCNY] 
Model Serial 


Number Number Price Color Name Year 

1 1234 342 13.25 blue pot 1974 
2 1234 347 13.25 red pot - 1974 
3 1234 410 14.23 red pot 1975 
4 1465 347 9.45 black pan 1974 
5 1465 390 9.82 black pan 1976 
6 1465 392 9.82 red pan 1976 
7 1465 401 9.82 red pan 1976 
8 1465 409 9.82 blue pan 1976 
9 1623 311 22.34 blue kettle 1973 
10 1623 390 30.21 blue kettle 1976 
11 1623 410 28.55 black kettle 1975 
12 1623 423 28.55 black kettle 1975 
13 1623 428 28.55 blue kettle 1975 
14 1654 435 28.55 red kettle 1975 
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be a key of the atomic projection T[Q,]. Now replace T[Q] in D; by 
T[Q — Q,)K] and T[Q,] to obtain D;+,. Continue the process until all 
the relations in the decomposition D, are in BCNF. 


Example 3: Let us consider the relation R[MSPCNY] from Ref. 23, 
where M = model number, S = serial number, P = price, C = color, N 
= name, and Y = year. The tuples of the relation R[MSPCNY] are 
shown in Table II. 
The Hasse diagram of the relation lattice L(R[Q]) is illustrated in Fig. 
2, where 

a, = {1, 8, 9, 10, 13; 2, 3, 6, 7, 14; 4, 5, 11, 12}, 


m = {1, 2, 3; 4, 5, 6, 7, 8; 9, 10, 11, 12, 13, 14}, 
a3 = {1, 2, 4; 3, 11, 12, 13, 14; 5, 6, 7, 8, 10; 9}, 
m= (1, 2, 8 4 5, 6, 7, 8; 9, 10, 11, 12, 13; 14}, 


= (EEE LE EEEG 1 GT, 


Fig. 2—Relation lattice L(R[MSPCNY)). 
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and A(C) = 71, (N) = 2; 6(Y) = 73; 6(M ) = 14, O(P) = 15, A(S) = 18. 
For X C Q, 6(X) can be obtained easily by carrying out the meet 
operations on the attributes in X. 

The principal ideal functions of R[IMSPCNY] are 


fc(M, S, P, C, N, Y) =C + MS + NS + PS, 
fr(M, S, P, C, N, Y)Y=N+M+P+CY+CS, 
fe(M, S, P, C, N, Y) = P+ MY+CS + MS + NS, 
fu(M, S, P, C, N, Y) =M+CN+ CP+ CY+ NS + PS + CS, 
AM, S, P, C, N, Y)= Y+P+S, 
fs(M, S, P, C, N, Y) =S, 
and the key Boolean function is 
F(M, S, P, C, N, Y) = (C+ MS + NS + PS)-(N+M+P+CY 
+ CS) 
-(P +MY +CS + MS + NS) 
-(M+CN+CD+CY+ NS + PS + CS) 
(Y+P+S8S)-S 
= CS + MS + NS + PS. 


The keys of RIMSPCNY] are {CS, MS, NS, PS}, and Y is the only 
nonprime attribute. 

Initially, let Do = {R(MSPCNY)}. Since both atomic filters [7s) and 
[z9) are abnormal, we arbitrarily choose 79, and let 2 = 0,, = MPCNY. 
The relation lattice of R[2] is isomorphic to [z9). The principal ideal 
functions of R[>] are 


&c(M, P, C, N, Y) = fc(M, 0, P, C, N, Y) =C, 
&n(M, P, C, N, Y) = fxr(M, 0, P, C, N, Y) =N+M+P 4+ CY, 
&p(M, P, C, N, Y) = fe(M, 0, P, C, N, Y) = P+ MY, 
gu (M, P, C, N, Y) = fu(M, 0, P, C, N, Y) =M+CN+ CP + CY, 
&y(M, P, C, N, Y) = fy(M, 0, P, C, N, Y) = Y +P, 
and the key Boolean function is 

G(M, P, C, N, Y) =C-(N+M+P+CY)-(P + MY) 

-(M+CN + CP + CY)-(Y + P) 
= CP + CMY. 
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We choose the key CP and replace R[MSPCNY | in Do by R[(Q — 2) K] 
= R[SPC] and R[Z] = R[LMPCNY] to obtain D, = {R[SPC], 
RLMPCNY}}. The relation R[SPC] and its lattice are shown in Table 
III and Fig. 3, respectively. 

The relation R[SPC] is in BCNF, but the relation R[IMPCNY] is 
not. The relation lattice of RUMPCNY] is isomorphic to the filter [79). 
We will not duplicate the figure. Both “atoms” 7, and 77 of [79) are 
abnormal. We choose the filter [77). The principal ideal functions of 
R[z,,] = R[MPNY] are 


hu (M, P, N, Y) = gm(M, P, 0, N, Y) = M, 
hy(M, P, N, Y) = gn(M, P, 0, N, Y) =N+M+4+P, 
hp(M, P, N, Y) = gr(M, P, 0, N, Y) = P+ NY, 
hy(M, P, N, Y) = gy(M, P, 0, N, Y) = Y + P, 
and the key Boolean function of R[MPNY] is given by 
H(M, P, N, Y) =M-(N+M+P)-(P + MY)-(Y + P) 
= MP + MY. 


We choose the key K’ = MP and replace R[MPCNY] in D, by R[(z 
— Y,,)K'] = R[MPC] and R[MPNY] to obtain D2 = {R[SPC], R[MPC], 
RLMPNY]}. The relation R[MPC] and its relation lattice are illus- 
trated in Table IV and Fig. 4, respectively. 

Now we have to decompose the relation R[MPNY] in D2. The 
relation lattice of RLIMPNY] is isomorphic to [77) of L(RL[MSPCNY)). 
We choose the abnormal filter that is isomorphic to [75). Since 2,, = 
PNY and the only key is P, we can replace RL[MPNY] in D, by R[MP] 
and R[PNY] to obtain D3 = {R[SPC], R[MPC], R[MP], R[PNY]}. All 
the relations in D3 are in BCNF. The relations R[MP], R[PNY] and 


Table III—Relation R[ SPC] 


Serial 
Number Price Color 
1 342 13.25 blue 
2 347 13.25 red 
3 410 14.23 red 
4 347 "9,45 black 
5 390 9.82 black 
6 392 9.82 red 
7 401 9.82 red 
8 409 9.82 blue 
9 311 22.34 blue 
10 390 30.21 blue 
11 410 28.55 black 
12 423 28.55 black 
13 428 28.55 blue 
14 435 28.55 red 
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Fig. 3—Relation lattice L(R[SPC}). 


Table IV—Relation R[ MPC] 


Model 
Number Price Color 
1 1234 13.25 blue 
2 1234 13.25 red 
3 1234 14.23 red 
4 1465 9.45 black 
5 1465 9.82 black 
(6, 7) 1465 9.82 red 
8 1465 9.82 blue 
9 1623 22.34 blue 
10 1623 30.21 blue 
(11, 12) 1623 28.55 black 
13 1623 28.55 blue 
14 1654 28.55 red 





Fig. 4—Relation lattice L(R[MPC]). 
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their respective lattices are shown in Tables V and VI, and Figs. 5 and 
6. O 


Vi. MULTIVALUED DEPENDENCIES 


Multivalued dependency (MVD) proposed by Fagin’ and Zaniolo® is 
the necessary and sufficient condition for a (binary) lossless-join 
decomposition. A similar concept, called hierarchical dependency, was 
defined by Delobel.” A bit later, the concept of multivalued depend- 
ency was generalized to join dependency by Rissanen.!*!! A set of 
“axioms” or inference rules for multivalued dependencies was given 
by Beeri, Fagin, and Howard.”? We know from our previous discussion 
that functional dependency is equivalent to partial ordering in the 
partition lattice. In this section we show that multivalued dependency 


Table V—Relation R[ MP] 


Model 
Number Price 
(1, 2) 1234 13.25 
3 1234 14.23 
4 1465 9.45 
(5, 6, 7, 8) 1465 9.82 
9 1623 22.34 
10 1623 30.21 
(11, 12, 13) 1623 28.55 
14 1654 28.55 


Table VI—Relation R[ PNY] 


Price Name Year 

(1, 2) 13.25 pot 1974 

3 14.23 pot 1975 

4 9.45 pan 1974 

(5, 6, 7, 8) 9.82 pan 1976 

9 22.34 kettle 1973 

10 30.21 kettle 1976 

(11, 12, 13, 14) 28.55 kettle 1975 





Fig. 5—Relation lattice L(R[MP]). 
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Fig. 6—Relation lattice L(R[PNY)). 


is equivalent to a lattice equation. First, however, we state the defini- 
tion of MVD and show that MVD guarantees information-lossless join 
decomposition. 


Definition 5: Let R be a relation on the set of attributes 0 = XYZ, 
where X, Y, and Z are disjoint subsets of 2. We say there is a 
multivalued dependency X —— Y if 


R.LY]= RAY], W(x) © RIX], (z) € RIZ). 


Lemma 5: Let R be a relation on Q = XYZ, where X, Y, and Z are 
disjoint subsets. Then, 


R[XYZ] = R[XY] | X | RLXZ] 
iff 
RAYZ] = |RAYI|-|RAZI|, V(x) © RX]. 
Proof: (Necessity) R[XYZ] = R[YX] | x | R[XZ] implies 
RAYZ] = RLY] x R{Z], V(x) © R[X]. 
Hence, 
IR.AYZ]| = |RAY]|-[|R[Z]]. 
(Sufficiency) It is easy to verify that 
R,YZ] C RLY] x R,[Z], V(x) € RX]. 
The given cardinal identity assures that 
RALYZ] = RLY] X R,{Z], V(x) € R[X]. @ 


Theorem 5: Let R be a relation on the set of attributes Q = XYZ, where 
X, Y, and Z are disjoint subsets.* Then, 


R[XYZ] = R[XY] | X | R[XZ] ff X -— Y. 


* For convenience, we assume X, Y, and Z to be disjoint. It will later become clear 
that this assumption is not necessary. 
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Proof: (Necessity) From Lemma 5, it is sufficient to show that 


IRAYZ]| = |RALY]|-|RAZ]|, V(x) © R(X) 


iff 
RilY]=RAY], Ve) ERX), (2) © RZ). 
Since 
RLXYZ] = R[XY] | x | RLXZ] 
implies 


RAY] X (x, z) = RLY] X (x, 2), V(x) € R[X], (z) € R[Z]. 
Hence, 
Ri Y] = RLY). 
(Sufficiency) For every (x) € RLX], we have 
(x) x RAZY] = {(x, zi) X Rel Y]|(%, a) © R[XZ]} 
= {(x, 2:) x RAY] | (x, 21) © RLXZ]} 
= (x) x R,[Z] x RLY]. 
Since |x| = 1, it follows that 
IRAYZ]| = |RLY]|-|RAZ]], V(x) € R[X). @ 


We need the commutative property of the product of two equivalence 
relations (partitions) to establish the lattice equation of multivalued 
dependency. The product of two equivalence relations may not be an 
equivalence relation; if it is an equivalence relation then the product 
must be commutative and vice versa. 

Definition 6: Two binary relations p and p’ and S are permutable 
(commute) if and only if p ° p’ = p’ ° p. This means that if a p x p’ b 
for some x € S, then a p’ yp b for some y € S, and conversely.'® 
Lemma 6: Let p and p’ be equivalence relations (partitions) on S. Then 
the following are equivalent: 

l. p°p’=p°p 

2.p°p'=pO@p' 

3. p° p’ is an equivalence relation 

4. p° p’ is symmetric. 

Proof: The proof of Lemma 6 is given in Ref. 21. 
Lemma 7: Let R be a relation on the set of attributes Q and let X, Y, Z 
CQ. Then, 


, 


OX) = OXY) + O(XZ) = OXY) ° OXZ) = O(XZ) ° OXY) 
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iff 
0(X) C A(XY) ° 0(XZ). 


Proof: (Necessity) Trivial. 
(Sufficiency) Suppose t,6(X Y) ° (XZ )t2. Then there exists ts € R[Q] 
such that 


t0(XT )t30(XZ ) te, 
which implies 
t,0(X )t30(X )te. 
Therefore, 
t,0(X )to. 
Hence, 


O(XY) ° (XZ) C A(X). 
It follows that 
6(X) = OXY) ° O(XZ). 
From Lemma 6, we have 
A(X) = OXY) BO XZ) = OXY) ° XZ) = O(XZ) ° OXY). 
Since 
OXY) S OXY) + O(XZ) and A(XZ) S A(xY) + (XZ), 
by the definition of the join operation ® in [[(R[Q]), we must have 
OX) = OXY) @ (XZ) S OXY) + O(XZ). 
But, 
OXY) + (XZ) S OX) + OX) = HX), 
so it follows that 
6(X) = OXY) + XZ) = OXY) ° XZ) = O(XZ) ° OXY). HF 


The following theorem shows that the multivalued dependency can 
be formulated as a lattice equation. 


Theorem 6: Let R be a relation on the set of attributes Q = XYZ, where 
X, Y, and Z are disjoint subsets. Then, R[XYZ] = R[XY] | X | R[XZ] 
iff 
O(X) = OXY) + 0(XZ) = OXY) ° O(XZ) = O(XZ) ° OXY). 
Proof: (Necessity) Since R[X YZ] = RLXY] | x | R[LXZ] implies 
RAYZ]=R,[Y] x R{Z], V(x) € R[x], 
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there is a one-to-one and onto mapping ¢,: R,{ YZ] — R,[Y] x RZ], 
which takes every tuple (y, z) € R,[ YZ] into ¢.((y, z)) = ((y), (z)) € 
R,[Y] X R{[Z], V(x) € R[X]. Suppose t,, tg € RLXYZ] and t,6[X ]ts, and 
assume ¢; = (x, yi, 21) and te = (x, yo, 22). AS 


(yi, 21), (Ye, 22) € R.AYZ] = RAY] X R,{Z], 
we have 
(y1), (92) E RAY] and (2), (22) € R[Z]. 


Since ¢, is an onto mapping, there must exist two tuples tz = (x, yi, 
Za), and t, = (x, yo, 21) € RIXYZ]. Hence, 


t,0(X Y )t30(XZ ) to, 


which means 


tO(XY) ° 0(XZ)te. 
It follows that 
6(X) € OXY) ° (XZ). 
From Lemma 7, we have 
OX) = OXY) + OXZ) = OXY) ° O(XZ) = O(XZ) ° OXY). 


(Sufficiency) We know R[X YZ] € RLXY] | X | R[LXZ]. Suppose t = 
(x, y,z) € R [XY] | X | R[XZ]. Then there exist t, = (x, y, z’), te = (x, 
y’, 2) € R[LXYZ]. Thus, 


t)0(X )te, 
which implies 
LO(XY) ° (XZ) to. 
There must exist t; € RLX YZ] such that 


t0(X Y)t30(XZ ) te. 
Therefore, 
ts = (x, y, z) =t © R[XYZ], 
and thus 
R[XY]| xX | RLXZ] C R[XYZ]. 
Hence, 


R[XYZ] = R[XY] | x | R[XZ]. O 


It should be noted that in the above proof we use the fact that 2 = 
XYZ and 6(Q) = 0(XYZ) = 0, i.e., there are no duplicated tuples in 
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R[Q]. The inference rules of MVD are given and proved in Appendix 
B. 


Example 4: Consider the relation R[ECSY] of Exa 
the MVD: ae EES where O(E) = m1 = {1, 2; 3, 4, 5, 6; 7, 8}, 
= m= {1, 2; 3, 4; 5, 6; 7, 8}, O(ESY) = 75 = {1; 2; 3, 5; 4, 6; 7; 8}. It is 
easy to vetily that 








TW, = Wot ts = To° 5 = 75° To 


It is known that if R is a relation on Q = XYZ, and X ——= Y then 
X —— Z. The symmetricity of the MVD can easily be seen in the 
lattice equation of Lemma 7. 

If XYZ C Q and 6(X) = (XY) + O(XZ) = OXY) ° O(XZ) = O(XZ) 
° (XY) holds, then X —— Y|Z is called an embedded multivalued 
dependency (EMVD)’; this is simply a multivalued dependency in the 
projection R[X YZ] of R[Q]. 

Theorem 6 clearly indicates that the MVD is actually a condition 
pertaining to data independency rather than data dependency. For this 
reason, we introduce the notion of decomposition of two sets of 
attributes in a relation as follows. 


Definition 7: Let R be a relation on the set of attributes 0. The two 
sets of attributes 2;, M2 € Q are decomposable in R if 


6(Q, + Qe) = O(Q1) + A(Q2) = O(Qy) ° O(Q2) = O(Qe) © (M4). 


It is easy to see that Q; and 2, are decomposable in 2 iff 2; + 2; > 
> Q) — Q2|Q2 — Q: is an EMVD in R. Furthermore, if 2:22 = 2 then 
Q; ad Q, > = Q>5 (or Q; + Qs 2 D2 — 0;) is an MVD in R. In the 
latter case, (Qi, Q2) is called a decomposition pair by Armstrong and 
Delobel.” 

We feel that decomposition is a basic concept in the study of the 
structure of databases. It can be naturally generalized to the concepts 
of projective decomposition and mutual decomposition. Projective 
composition concerns the data independence of two sets of attributes 
on the projection of a relation. Mutual decomposition extends the 
concept of decomposition to more than two sets of attributes. 

Let p be a partition on the set S, the function p* = S — S/p maps 
a € S into (a)p* = a, is called the canonical function of p. For S = {a, 
b,..., e}, we will use the notation 


* = a b eee GC 
Ge be aes; 
to illustrate the canonical function p*. The equivalence relation 
ker p* = p* ° p* = {(a, b) E S X S| p*(a) = p*(b)}. 


is called the kernel of p*. Notice that ker p* = p. 
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Let p and o be partitions on S = p S o; then there is a unique 
function f from S/p onto S/o such that (a,)f = a,. The kernal of f, 


ker f= fe f= {(a,, b,) € S/p X S/p|ao b}, 


is an equivalence on S/p. It is usual to write ker f as o/p, the quotient 
of o and p. Note that a,(o/p)b, if and only if a o b and the mapping g: 
(S/p)/(a/p) — S/o defined by ((a,)./))g = a. is one-to-one and onto. 
Thus the function f defined above is in fact the canonical function of 
a/p, ie., f = (o/p)*. It is easy to see the diagram in Fig. 7 commutes, 
that is p* ° (a/p)* = o*. 











Example 5: Let p,-o be partitions on the set S = {1, 2, 3, 4, 5, 6, 7, 8}, 
such that p = {1, 2; 3, 4; 5, 6; 7, 8} and o = {1, 2, 3, 4; 5, 6, 7, 8} with 
p So. Then S/p = {J, II, II, IV}, where I = , lf = 3,.4, LF =5, 6, 
IV = 7, 8, and S/o = {a, 6}, where a = 1, 2, 3, 4, 8 = 5, 6, 7, 8. The 


canonical functions of p and o are 


- (189 45 6 7 8 
P \ITTIIUULULIVIV 


and 


,_ {12345678 
— “\aaaa BB Bey 


It follows that 
o/p = {I, IT; HI, IV} 


and 


(o/p)* = (! a ) o 


Lemma 8: Let p, 01, o2 be partitions on S such that p S oy, p S o2, and 
01° 62 = 62° 01. Then 


(a1 ° o2)/p = (01/p) ° (o2/p). 


o 
$F $/a 


(a/p)* 


S/p 


Fig. 7—Canonical function of quotient partition. 
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Proof: It is clear that o; ° o2 is a partition on S and p S 0; ° o2 = a2 ° 
o1. It follows from the definition of quotient partition that the lemma 
is true. 


Lemma 9: Let p, 61, 2 be partitions on S, such that p S= a1, p & oo. 
Then 


01° 62 = 62° Gj 
iff 
(o1/p) ° (o/p) = (2/p) ° (o1/p). 
Proof: (Necessity) 
(o1/p) ° (o2/p) = (01 ° o2)/p = (02 ° 01)/p = (a2/p) ° (o1/p). 


(Sufficiency) Suppose ao, ° o2b. Then there is a c € S such that 
ao ,Co2b. It follows that 


Ap(o1/p)Cp(o2/p)b,. 
There must be ad € S such that 
a,(o2/p)d,(o1/p)b,. 
Thus 
ao2do,b, 
and 
aoe ° o4b. 
Hence 


01° 62 Cae? 61. 


Similarly, we have o2 ° 0; € a; ° oo. Then 6, ° og = 02° on. 
Definition 8: Let R be a relation on the set of attributes Q. For 01, Q. 
C Q, the projective partition defined by 


9(Q4| Qe) = O(Q1 + Q2)/O(Qe2) 


is a partition on the set of tuples of R[Q]/@(Q2) = R[Q.]. The canonical 
function of 6(Q;|Q2) is denoted by 6*(Q;|Q2) = (6(Q; + Q2)/O(Q2))*; 
which satisfies 6*(Q2) ° 6*(Q,| Q2) = 6*(Q, + Qe). 

Certain properties of the projective partition are demonstrated in 
the following theorems and their proof directly follows from the 
definition of projective partition. 

Theorem 7: Let R be a relation on the set of attributes Q and ), ---, 
2, © Q. Then 


n n-1l 
g* (4 0, = O*(Q1) ° O*(Q2|Q1) 2 +++ 2 OF(Q,I Me Q;). 
=] _ 
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Theorem 8: Let R be a relation on the set of attributes Q and X, Qu, 
»++, 2, GQ, such that 


Then 
1. 0(X) = [zai ker(0*(Q;) ° 0*(X| Qz)) 
2. O(Qm| X) = ker(O*(Qn) ° 6*(X|Qn))/TLka1 ker(6*(Qz) ° 
6*(X|Q,)). O 
Definition 9: Let R be a relation on the set of attributes Q and 04, 02, 
> CQ. We say 2; and Q, are projectively decomposable on > if 


A(Q, + Qe] Z) = O(Q4| Z) + O(Q2| 2) 
= 0(Q)| Z) ° A(Q2| Z) 
= #(Q2|Z) ° 6(Q;| 2). O 


The EMVD is a special case of projective decomposition, which can 
be seen from the following theorem. 


Theorem 9: Let R be a relation on Q, and let 2), Q2, 2 € Q. Then Q, 
and Q»2 are projectively decomposable on & iff 


A(Q; + Q2 + ZT) = O(Q, + LT) + O(Q. + ZY) 
= 6(Q, + D) % A(Q2 a >) = 6(Qe + >) ® 6(Q, + D). 


Proof: The proof follows from Lemma 8 and9. OF 

Example 6: Consider the relation R on 9 = ABCDE in Table VII. 
The Hasse diagram of the relation lattice L(R[Q]) is shown in Fig. 8, 
where 


a, = {1, 2, 3, 5, 6, 7; 4} = 0(A), 


m2 = {1, 3, 4; 2, 5, 6, 7} = 6(B), 








ms = {1, 6, 7; 2, 3, 4, 5} = a(C), 
a4 = {1, 3; 2, 5, 6, 7; 4} = 0(AB), 
ms = {1, 6, 7; 2, 3, 5; 4} = (AC), 
m7 = {1; 2, 5; 3, 4; 6, 7} = (BC), 
a7 = {1, 6, 7; 2, 5; 3; 4} = a(D), 
m3 = {1, 3; 2, 6; 4; By 7 = 0(E), 


Let Q; = ABD, Q. = ACE, > = ABC. We find that 
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Table Vil—Relation lattice L(R[Q]) 


J ay db, C1 ad, ey 
2 ay, bp C2 dz €2 
3 ay, b; C2 dz e) 
4 a2 b, C2 d, €3 
5 ay, be C2 dy e4 
6 Qa, b. Cy d, €2 
7 ay, b. Cy dy e4 





Fig. 8—Relation lattice L(R[Q)). 
6(Q1 + Q2|Z) = 0(A| ABC) = 6(A)/0(ABC) = {I, I, III, V; IV}, 
6(9,|Z) = 0(ABD| ABC) = 0(AB)/6(ABC) = {I, III; II, V; IV}, 
6(Q2|Z) = 0(ACE| ABC) = 6(AC)/0(ABC) = {I, V; I, II; IV}, 


and 
O(ABC) = {I, H, LI, IV, V}, 


where J = 1, IJ = 2, 5, III =3, IV= 4, V =6, 7. 
It is easy to see that ABD and ACE are projectively decomposable 
on ABC, i.e., 


6(A| ABC) = 6(ABD|ABC) + 6(ACE| ABC) 
= 0(ABD|ABC) ° 0(ACE|ABC) 
= 0(ACE| ABC) ° 6(ABD|ABC). 
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But the MVD: A —— BD (or A ——= CE) does not hold in R[Q]. 
Nevertheless, the EMVD: A — B|C does hold in R[Q]. O 

So far we have discussed the properties of decomposition of two sets 
of attributes. The concept of decomposition certainly can be extended 
to any n > 2 sets of attributes. We define the notion of mutual 
decomposition as follows: 
Definition 10: Let R be a relation on the set of attributes 2. The 
sets of attributes 2), Q2, --- , Q, € Q are mutually decomposable, if for 
any ]C N = {1, ---, n} andJ C N — J, the two sets of attributes 
Q; = User ND; and Qy = Ujey 2; are decomposable. O 
Theorem 10: Let R be a relation on the set of attributes Qand Q, --- 
Q, = Q. Suppose Q), «++ , Qn are mutually decomposable. Then 


RQ] = RQ, --- Qa] = REQ] |x] --- |x] ALO]. 
Proof: It follows from the definition of mutual decomposition that 
R[Q, +++ Qn) = RL, --+ Qn) | X | R[Qn], m=2,---n, 


Therefore the assertion is true by induction. O 

The above theorem states that mutual decomposition implies an 
information-lossless join. The converse is not true in general. The 
necessary and sufficient condition of an information-lossless join is 
called join dependency, which will be discussed in the next section. 


VII. JOIN DEPENDENCIES 
Join dependency (JD)*°"»" is a generalization of MVD. It refers to 


a collection {Q;, --- , 2,} of subsets of 2 such that 
9=% --- Op 
and 
R[Q] = RQ] |X] --- |X| R[Qz]. 


Join dependency can be considered as a “set of coordinates” of the 
relation. The connection between join dependencies and multivalued 
dependencies is given by the following lemma: 

Lemma 10: Let R[Q] = R[Q:]| xX] --- |X| R[Q,], let No be a subset of 
{1, --+, n}, and let N, = {1, ---, n} — No. Then (Qn, Qn,) is a 
decomposition pair, where 


Q = Q; cee ae On, = U Q:, and Qn, = U Q;. 
i iEN, 
Proof: Since 
| U 0 C Ix! R[Q,], 
i€No 1ENo 
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and 
| U 0 C Ix! RQ], 
iEN, iEN, 
it follows that 
R[Qn,] | X | R[Qn,] S (x1 Rio) | x | (1 ria). 
iENo iEN, 


Since the natural join operation is commutative and associative,° we 
have 


RlQn,]| X| R[Qn,] S RIG] X] +++ |X| R[Qn] = RQ). 


But we know 
R[Q] © R[Qn,] | X | R[Qn,]. 


Hence, 
RQ] = RlQn |X| R(Qn] 


Let x be an X-value, and assume Y C X. We shall denote the 
Y-value in x as x[ Y]. Let t € R[Q] be a tuple and let Q = Q; --- Q,. 
The notation t 4 (wy, --- , W,) will be used to indicate that t[Q,] = w,, 
Vi € N, where N = {1, --- , n} denotes the index set. 

Before we state the necessary and sufficient conditions for join 
dependency, we first introduce the concepts of a set of consistent values 
and an indexed family of tuples. 

Definition 11: Let R be a relation on the set of attributes 2, and let 
{X;|t € N} be a collection of subsets of 2. The set of values {x;| x; is 
an X;-value, i € N} is called a set of consistent values of {X;|i € N} if 
the values of X; M X; in x; and x; agree, i.e., if 

xLXi ON Xj] = x[XiN Xi], ViijJEN. 

The set of tuples {t;|% € N} of R[Q] is called an indexed family of 
tuples with respect to {X;|i € N} if {x;| t[X;] = x; 1 € N} is a set of 
consistent values. i 


Theorem 11: Let R be a relation on the set of attributes Q, and let Q = 
Qi) +++ OQ. Then 


R[Q] = R[Qi] |X| --- |X| RQ] 
iff for every indexed family of tuples {t;|1 © N} with respect to {X;|i € 
N} there is a tuple t € R[Q] such that t[Q;] = t [Qi], Vi € N, where X; = 
Q; NM 0, and Q; = Up iQj. 
Proof: (Necessity) Let {t;|i € N} be an indexed family of tuples of 
R[Q] with respect to {X;|i € N}. Thus, {x;] t_X;] = x, 1 € N} is a set 
of consistent values. Suppose t,[Q;| = w;, 1 &€ N. We want to show that 
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there exists a tuple t A (wi, ---, Wp) © R[Q]. We will prove this by 
mathematical induction. We know that s; = (w,) € R[Q,] and (we) € 
R[Q.]. Thus, 


wi[Xi] =x, and we[Xo] = x2. 
Since {x;| i € N} is consistent, it follows that 
wi[X1, NM Xe] = [XN Xe] = x[X1N Xo] = w[X1N Xz]. 
It is known that 
XN X} = (91) N(Q,NG) =ANQ, iF; 
Therefore, 
W [21M Qe] = we[Q1 N Qo]. 


By the definition of natural join, we know that there exists a tuple 
S2 A (w,, Wa) E R[Qy] x R[Q2]. 

Suppose there is a tuple s,_, 4 (wi, ---, Wra-1) € R[Q:]| xX] --- 
| X | R[Q,-1]. Then 


Srl Xi ON Xn) = wlLX;N Xp] = x[XiN Xr] 
= xn[X;iN Xn] = wrAlLXi N Xz], t=1,---,n—-1. 
Hence, 
$n-1[9n 1 Qn] = Sn-1[On N (QU +++ U O-)] 
= Sni[(M1 MN Qn) U +++ U (Qn N Q,)] 
Sn-il(X1 A Xn) U +++ U (Xn-1 MN Xz)] 
Wrl(X1 A Xn) U +++ U (Xn N Xz)] 
= WalQn N On]. 
It follows that there exists a tuple ¢ such that 
t=s, 4 (wi, ---, Wp) © R[Q]|X| --- |x| R[Q,] = R[Q]. 
(Sufficiency) We know that 
R[Q] C RQ] |X] +++ |X| RLQa). 


For any t 4 (wi, ---, Wr) © R[Q:]| | --- |X| R[O,], there exists an 
indexed family of tuples {¢;| t,[Q;] = w;, i € N} of R[Q] with respect to 
{X;|71 € N} that has a set of consistent values {x;| w;LXi] = x;, 1 € N}. 
It follows that t 2 (w;, +--+, Wr) € R[Q]. Hence, 

RQ] = k[M]| x] --- |x] R[Q,]. G 


The necessary and sufficient conditions for JD given in the above 
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theorem are similar to the notion of template dependency introduced 
by Sadri and Ullman.”’ The following condition can be considered as 
an extension of the binary natural join operation. 


Corollary 3: Let R be a relation on the set of attributes 2, and Q = Q, 
--- O,. Then, 


R[Q] = RQ) |X] --- |X] R[Q] 
iff 
Ray,--x,LQ] = Ref] |X] +++ |X| Rs,[Xr] 
for every set of consistent values {x;|i € N} of {X;|1 € N}, where X; = 
Q; U Q:, Vi EN, and R,,....,x,[Q] = {t|t © R[Q], tLX;] = x, Vi € N}. 
Proof: The proof follows from Theorem 11. @ 

Clearly, for any t © R[Q] = R[Q --- Q,], the set of values {x;| ¢[X;] 
= x; X; = 2;N Q;, i € N} is always consistent. The converse is not 
necessarily true. Suppose for any set of consistent values {x;| x; is an 
X;-value, i € N} there is a tuple ¢ € R[Q] such that ¢LX;] = x; Vi € N; 
in this case we say {Q2;|1 © N} is complete. 

Corollary 4: Let R be a relation on the set of attributes 2, and Q = Q, 
-++» O,. Then {Q;|i € N} is complete iff 
RLXy +++ Xn) = RLM] |X| --- |X| REG, 


where X;=2;N0;,iEN. 
Proof: The proof follows directly from Theorem 11. 

The necessary and sufficient conditions for JD may be stated in a 
different form, as follows: 


Theorem 12: Let R be a relation on the set of attributes 2, and Q = Q, 
»++ Q,. Then 


R[Q] = RQ] |X] --- |X| RQ] 
iff 
1. {Q;, 23 is a decomposition pair, i € N, 
2. {Q;|t € N} is complete, i.e., 


RIX, --» X:] = RIXILX| +. |X| REG, X= 2:00, iENn. 


Proof: (Necessity) Condition 1 follows from Lemma 10. Condition 2 is 
a consequence of Theorem 11. 
(Sufficiency) We know that 


RQ] € R[Q]| x] --- |X| ALO]. 


Suppose t 4 (wi, ---, wp) © R[Q]|x| --- |X| R[OQ,]. Then there is 
an indexed family of tuples {¢;] t[Q;] = w;, i € N} of R[Q] with respect 
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to {X:|t © N} and the set of consistent values {x;|w{Xi] = 
Xj, Xj = 2,0 Q;, 1 E N}. We will prove by mathematical induction that 
t& (wi, --- , Wn) € RiQ]. 

Since {0;|i € N} is complete, there exists a tuple s 4 (y1, --- , Yn) 
€ R[Q] such that 

s[X;] = Xi, ViEN. 
We know that 
t,[Qy NM 1] = t,[X.] = wi[X)] =H X= s[X)] = s[Qy NM On], 
which means 
t,6(Q; MN Q:)s. 


Since (9, 9;) is a decomposition pair, there exists a tuple s, € R[Q] 
such that 


t,0(Q1)s10(9;)s. 
Hence, 
8; 4 (Wi, ya, +++ 5 Yn) © R[Q]. 


Suppose there is a tuple s,_; & (wi, +++ , Wn-1, Yn) © R[Q]). It follows 
that 


te[Qn 1 Qn) = telXa] = Wal Xn] = Xe = Sn-A Xa} = Sn-12n M O,].- 
Thus there is a tuple s, € R[Q] such that 
tO (Qn)8n8(Qn)Sn-r- 
Hence 
t=s,4 (Ww, -+-,wWp) E RQ). O 


It is known that a special class of JD, called acyclic join dependency, 
has many desirable properties; this class makes operations like updates 
and joins especially easy.’ 8 A collection of subsets {0;|i € N} of the 
set of attributes Q is called acyclic if all the attributes can be deleted 
by repeatedly applying the following two operations:’>”8 

1. Delete from some Q; an attribute A that appears in no other Q),; 

2. Delete one Q; if there is an Q;, 1 # j, such that Q; € Q;. 

A reduction {Y;|j € J GC N, and Vi € N — Jj Ed such that Y; € 
Y;} is obtained by removing from {Y;|1 © N} each Y; that is contained 
in another Y;j. 

Definition 12: Let S = {0;]1 € N} be a collection of subsets of 0. The 
core of S, denoted by S, is defined as follows: 
1.S=@,for|S|=N=1 |. 

2. S is the reduction of {Q; N Q;|i € N}, for |S| =N>1. 
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There are many different but equivalent conditions that characterize 
a collection of subsets as acyclic.!° We will use the following one: 
Lemma 11: A collection S = {Q;|i € N} of subsets of Q is acyclic iff its 
core S is acyclic. 
Proof: S can be obtained from S by performing the operations 1 and 
2 defined above. It follows that if S is acyclic then S is acyclic and 
vice versa. 
Corollary 5: Let S = {Q;|i € N} be an acyclic collection of subsets of Q. 
Then |S|> |S]. 
Proof: For |S| = 1, |S| = |O| = 0. For |S| > 2, we know that 
|S| = |S|. Suppose |S| = |S]. Then any attribute Ain$ must be 
contained in at least two distinct subsets of S. Let AE 9;N ;. Then 
A €Q;and A € Q;. There is aj # i such that A € Q;. Since Q; C 0; = 
Uz ~jQe it follows that 


AEQNQES. 


Since |S| = |S| = 2, S is not empty. Now, neither operation 1 nor 
2 can be applied to reduce S. From Lemma 11 we know this contradicts 
the assumption that S is acyclic. Thus,|S|>|S|. 

A JD R[Q] = R[Q,]| x] --- |X] RLQ,], Q = Q, --- Q,, is an acyclic 
join dependency if {Q;|i © N} is acyclic. A recursive condition for 
acyclic join dependency is as follows: 

Corollary 6: Let R be a relation on the set of attributes Q = Qy --+ Qn. 
Then 


R[Q] = R[Q) |X] --- |X} REO] 


is an acyclic join dependency iff 

1. (Q;, 2) is a decomposition pair of R[Q], i= 1, ---,n, 

2. R[X, --- Xp] = R[X]|X | --- |X| RLXn] ts an acyclic join de- 
pendency over the set X, --» Xm © Q, where {X;|1 = 1, --- , m} is the 
core of {Q;|i € N}. 

Proof: The join dependency of a collection of sets and the join de- 
pendency of its reduction are equivalent.” The proof easily follows 
from Theorem 12 and Lemma ll. 

The above corollary simply states that acyclic join dependency is 
equivalent to a set of MVDs and EMVDs, i.e., a set of simultaneous 
lattice equations that can be derived recursively. It has been shown by 
hypergraph theory that an acyclic join dependency is equivalent to a 
set of MVDs.'°® That is, the converse of Lemma 10 is true for acyclic 
join dependency; we will prove that the converse of Lemma 10 is a 
consequence of Corollary 6. 

Theorem 10: Let R be a relation on the set of attributes Q= Q, --- OQ, 
such that {Q;|1t © N} is acyclic. Suppose for any No G N = {1, --- , n}, 
N,= N- No, and 
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RQ] = R[Qn,] | X | R[Qn,]. 
Then 
RQ] = RQ] |X| --- |X| RQ] 


is an acyclic join dependency. 

Proof: This theorem will be proved by mathematical induction on n. 
For the smallest nontrivial case n = 3, let the core set {X;|i=1, ---, 
m} of {Q;|i = 1, 2, 3} be the reduction of {Y; = Q; NM Q;|i = 1, 2, 3}. 
First we want to show that 


R[X, «++ Xm] = REX] | X] +++ |X| REX). 


We know m < 3 from Corollary 5. There is nothing to be proved if 
m < 2. For m = 2, without loss of generality, let X,; = Y1, Xo = Yo, 
and Y3 Cc Y2 = Xo. Then 


XN X. = ¥iN Yo = Yin YeY3 = (¥1N Yo) U (Yi NM Y3) 
= (2) MN Qe) U (Q7 N Qs) = 27 N Qs. . 
Since (Q), 2.03) is a decomposition pair, 
O(X, N Xo) = 6(Q7 N 2203) E O(Qy) ° (D203). 


Also we have 


2,2 Y= X, 
and 
1.03 2 YoY3 = Yo = Xo. 
Thus | 
0(Q1) S A(X), 
A(Q203) S O(Xo), 
and 


B(Qy) ° A(Q2Q3) E O(Xy) ° (Xa). 

It follows that 

0(X1 M Xe) © O(X1) © OC Xe). 
Hence 

R[X Xo] = R[LXG]| X | R[X2]. 
It follows from Corollary 6 that 

R[Q] = RLQ1) | X | R[Q2] | X | R[Qs]. 

Suppose the theorem is true for all k < n. Let the core set {X;|i = 1, 
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..., m} of {Q;|i € N} be the reduction of {Y; = 2;N Q;|i € N}. We 
know m <n and for any Mp C M = {1, --- , m} and M, = M — My 
there is an No C N and N, = N — No such that 


MoN, MEM, 


and 

Xm, = Ynys Xm, = Yn, 
Then, 
Xu, Xm, = Yy,N Yn, = VU (Yi N Yj) 


iENo, JEN, 


= U (Q; Nn Q;) = Qn, MN Qy,- 


IENo, JEN, 


Since (Qy,, Qn,) is a decomposition pair, we have 
(Xu, ON Xu,) = 0(QN, A Qwn,) E O(QN,) ° O(QN,). 
Also, we know 


Qn, 2 Yn, = Xm 


and 

Qn, 2 Yn, = Xm, 
Thus 

(Qn) S O(Xm,), 

0(Qn,) S 6(Xm,), 
and 


(Qn) ° O(Qn,) & OX) ° O(Xm,). 
It follows that 

0(Xy, O Xm,) © 0(Xm,) ° 0(X™m,). 
Hence 

R[X, +++ Xm] = RLXm,]| | RX] 


for any My © M, M, =M-— Mo. 
Since the theorem is true for m <n, we have 


R[Xy +++ Xm) = REX] |X| +--+ |X| REXn). 
It follows from Corollary 6 that 
RQ, --- Qn] = R[Q] |X] --- |X| RQ.) oF 
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Further discussion of the properties of acyclic join dependencies can 
be found in Refs. 15 and 16. A linear-time algorithm for testing 
acyclicity is given in Ref. 28. 


VIII. CONCLUSIONS 


We have shown that lattice theory is a powerful tool in the analysis 
of the structure of relational database systems. Using this tool, we 
have established a unified theory of relations. As we have seen, almost 
every concept in the existing relational database theory has a counter- 
part in the lattice theory. This suggests that further study of relations 
should be carried out within the framework of lattice theory. The 
independency theory of lattices, which is a generalization of the 
familiar notion of independency in the geometries,’*” is especially 
important and relevant to the structure of relational database systems 
if its relation lattice is modular. This approach may lead to a geometric 
interpretation of data dependencies and independencies, which would 
make the theory more intuitive and also more useful for practical 
application. 

The establishment of this algebraic theory of relational databases is 
done in the same spirit as the construction of probability theory. A 
probability space is a triple (2, 2, P), where Q is the sample space, 2 
is a o-algebra of the subsets of 2, and P is a real-valued function, 
called a probability measure, defined on the o-algebra 2.!"”° The notion 


Table VIlII—Comparison of probability theory and 
the theory of relational databases 


Theory of Relational 


Probability Theory 
Sample space 2 
2, the c-Algebra of subsets of 
Q 


Probability measure 

P:Y— R([0, 1] 

o-additivity: 

{X,} is an denumerable union 
of disjoint events 


P(uU xX,)= 30 P(X) 
k=1 k=l 


Databases 
Set of attributes Q 


2°, the Boolean algebra of sub- 
sets of Q 


Partition function 

6: 2° — TI[R(Q)] 

Meet-morphism: 

{X,} is a finite collection of 
sets of attributes 


0(U Xi) = (Xi) --- 6X) 


P(Q)=1 6(Q) = 0 
P(@) =0 6(@) =1 
O< P(X) S1,VXEZX 056(X)51,V XE 2° 


If XC Y, 

P(X) < P(Y) 

If Q; and Q, are independent, 
P(Q) 1 Q2) = P(Q1) P(Q2) 


If X 2 Y, 
6(X) 5 a(Y) 


If 2; and 2, are decomposable, 
6(Q) MN Q2) = O(Q1) + O(Q2) 
= 6(Q1) ° A(Q2) = A(Q2) ° 6(Q1) 
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of a o-algebra of sets also has an abstract generalizaton, namely it is 
a particular case of a Boolean o-algebra.*° A comparison of the alge- 
braic theory of relational databases and probability theory is shown 
in Table VIII. 

We feel that this theory of relational databases can be used to 
analyze the nonquantitative aspects of data dependencies (or indepen- 
dencies), whereas probability theory is the basis of quantitative data 
analysis, namely statistics. This comparison is not meant to imply 
that there is a one-to-one correspondence between the theory of 
relational databases and the theory of probability. Nevertheless, we 
are convinced that the lattice theory could play a role in the theory of 
relational databases similar to the role measure theory plays in the 
theory of probability.” 

The computational algorithms for meet and join operations of 
partitions are given in Ref. 31, which provides the basic tools for 
future development of algorithms for relations. 
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APPENDIX A 
Properties of Meet and Join Operations 


In any lattice (L, -, +), the operations of meet and join satisfy the 
following laws: 
Ll—a-a =a, a+a=a; (Idempotent) 
L2—a-b=b-a,a+b=b +a; (Commutative) 
L3—a-(b-c) = (a-b)-c, 
at+(b+c)=(a+b) +c; (Associative) 
L4—a-(a + 6) =a + (a-b) = a; (Absorption) 
Li—a=b iff a-b=a, 
ab iff a+b=b6; (Consistency) 
L6é—b Sc implies a-bSa-c 
bSc implies a+bSa+c; (Isotone) 
L7—a-(b + c) 2 (a-b) + (a-c) 
a+ (b-c) S (a+ b)-(a +c); (Distributive Inequalities) 
L8—aSc implies a+ (b-c) S (a+ b)-c. (Modular Inequality) 
A lattice is called distributive if equality holds in L7 and is called 
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modular if equality holds in L8. A Boolean algebra is a lattice (L, .-, 
+, ~) with the following additional properties:*° 
L9—a-(b + c) = (a-b) + (a-c), 
a+ (b-c) = (a+ b)-(a +c); (Distributive Identities) 

L10—a=c implies a+ (b-c) = (a+ b)-c; (Modular Identity) 
L11—L contains universal bounds 0, 1, which satisfy 

0-a=0,0+a=a, 

l-a=a,lt+a=1; 
L12—Va € L, da € L such that 

a-ag=0,a+d=1,a=a, 

(a-b) =a+ 6, (a+ b) =a-6. 





APPENDIX B 
The Proofs of Axioms for Functional and Multivalued Dependencies 


The first three of the following are Armstrong’s axioms for func- 
tional dependencies:!” 
B1. (Reflexivity for functional dependencies) 

If YC X EQ, then X > Y. 
Proof: 0(X) = 0(Y(X — Y)) = 0(Y)0(X — Y) S @(Y). | 
B2. (Augmentation for functional dependencies) 

If X > YandZ CQ, then XZ > YZ. 
Proof: 0(XZ) = 0(X)0(Z) S 6(Y)0(Z) = 0(YZ). | 
B3. (Transitivity for functional dependencies) 

If X — Y and Y > Z, then X —> Z. 
Proof: 6(X) = 6(Y) and 6(Y) = 6(Z) imply 6(X) S @(Z). a 

The next three axioms apply to multivalued dependencies: 
B4. (Complementation for multivalued dependencies) 

If X —— Y then X —>> 0 - X — Y. 
Proof: 0X) = OXY) + 0(XZ) = OXY) ° 0(XZ) = O(XZ) ° OXY), 
where Z=Q— X — Y. | 
B5. (Augmentation for multivalued dependencies) 

If X —— Y, and VC W, then WX —>—> VY. 
Proof: Without loss of generality,* we can let Q = ABCDEFGHIJKL, 
X = ABCDEF, Y = BCGHFI, W = CDEFHIJK, V = EFIJ (see Fig. 
9). Then 2 — X — Y=JKLandQ —- WX - VY=L. 

We want to show that 


6(ABCDEF) € 0(ABCDEFGHI) ° 0(ABCDEFJKL) 


* This proof is carried out in terms of equivalence relations (partitions). It is irrelevant 
here whether an equivalence relation is the image of a single attribute or the image of a 
set of attributes. 
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Fig. 9—Set of attributes Q for Bd. 


implies 


((ABCDEFHIJK) € 6(ABCDEFGHIJK) ° 6(ABCDEFHIJKL). 


Suppose 
t}0(ABCDEFHIJK) to. 
Then 
t,0(ABCDEPF)t. 
There exists ts, such that 
t,0(ABCDEFGHI)t30(ABCDEFJKL)t2. 
From (1) and (2), we have 


ts0(JK )to0(JK)t. 
It follows from (2) that 
t,0(JKG )ts. 
Combining (2) and (3), we have 
t,0(ABCDEFGHIZJK ) ts. 

Relation (2) also implies that 

t,0(H1I)ts. 
From (1), we know 

t)0(HI) te, 
and therefore 

t30(HI)te. 


It follows from (2) and (5) that 
t,0(ABCDEFHIJKL)ts. 
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Combining (4) and (6), we have 
t,0(ABCDEFGHIJK )t36(ABCDEFHIJKL) to. 
It follows that 
6(ABCDEFHIJK) € 0(ABCDEFGHIJK) ° 0(ABCDEFHIJKL). O 
B6. (Transitivity for multivalued dependencies) 
If X +> Yand Y —— Z, then X ——> Z — Y. 


Proof: Again, without loss of generality, we can let Q = ABCDEFGH, 
X = AFGH, Y = BCFG, Z = CDGH (see Fig. 10). Then Z — Y = DH, 
Q-XY=DE,Q2—- YZ=AE,2—-—X(Z - Y) = BCE. 

We want to show that 


0(AFGH) C 0(ADEFGH) ° 6(ABCFGH) 
and 


6(BCFG) € 6(ABCEFG) ° (BCDFGH) 

imply 
6(AFGH) € &(ADFGH) ° &(ABCEFGH). 
Suppose t,0(AFGH)t.. Then there exists tz such that 


t,0(ADEFGH )t;0(ABCFGH )t2. (7) 
Since t.6(BCFG)ts, there exists t, such that 
to0(ABCEFG)t,0(BCDFGH )ts. (8) 
It follows that 
t,0(AFG)t30(AFG)to0(AFG ) ts. (9) 
From (7) and (8), we have 
t,0(DH)t30(DH )ts. (10) 





Fig. 10—Set of attributes Q for B6. 
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Combining (9) and (10) yields 


t)0(ADFGH Dts. (11) 
From (7) and (8), we have 
t,0(H )t30(H ) te. 
It follows from (8) that 
t,0(ABCEFGH ) te. (12) 


Relations (11) and (12) yield 
t,0(ADFGH )t.0(ABCEFGH ) tp. 
Hence 
6(AFGH) C 6(ADFGH) ° 6(ABCEFGH). O 
The last two axioms relate functional and multivalued dependencies. 


B7. If X — Y then X —>— Y. 
Proof: Let Z = Q — XY. We want to show that 


0(X) =6(Y) implies 6(X) C O(XY) ° (XZ). 


Suppose t,0(X)to. Since 6(X) S 6(Y) implies 6(XY) = 6(X), then 
t,0(X Y)ts. It follows that 


t:0(X Y)to6(XZ ) te. 
Hence 
AX) CAOXY) ° O(XZ). 


B8. If X —— Y, Z C Y, and for some W disjoint from Y, we have 
W — Z, then X > Z. 
Proof: Again, without loss of generality, we can let 9 = ABCDEFGH, 





lig. 11—Set of attributes 0 for B8. 
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X = ACEF, Y = EFGH, Z = FG, and W = AB (see Fig. 11). Then Q — 
XY =BD. 
We want to show that 


0(ACEF) € 6(ACEFGH) ° 0(ABCDEF) 
and 
6(AB) S 0(FG) 
imply 
6(ACEF) 3 0(FG). 
Suppose t,0(ACEF)T>. Then there exists tz such that 
t,0(ACEFGH )t30(ABCDEF)to. 


Since 

t0(FG)tz; and t30(AB)te, 
we have 

tiO(FG)tz; and t36(FG)te, 
and thus 

tiO(FG)te. 

Hence 

6(ACEF) 3 6(FG). 
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Generation of Syntax-Directed Editors With 
Text-Oriented Features 
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Often, syntax-directed editors rely solely on menu selection for program 
construction. We describe here the generation of hybrid editors that give a 
programmer the option of either (1) using menu selection and tree navigation 
as in a syntax-directed editor, or (2) entering text for parsing and navigating 
through the text as in a conventional editor at any stage during the expansion 
of a program. A prototype system, HEG (Hybrid Editor Generator), has been 
built to automatically generate such a hybrid editor from a high-level specifi- 
cation of a grammar for an application language. Each such generated hybrid 
editor is called an AGE (Automatically Generated Editor). We describe the 
HEG meta-language and briefly summarize the editing process in AGEs. We 
also describe possible extensions to the meta-language to describe program 
semantics, and the generation of the procedures to check those semantics 
during program construction. 


I. INTRODUCTION 


In the past, there has been a dichotomy between the way a devel- 
opment tool such as a text editor would manipulate the text of a 
computer program and the way another development tool such as a 
parser would manipulate the same text. The advent of syntax-directed 
editing has removed this difference by introducing the use of editors 
that store and manipulate programs entirely as (partially) instantiated 
syntax trees.’ ® The question, though, of whether a program should be 
manipulated only in terms of its syntax tree, or also in terms of its 
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textual representation, is yet to be resolved.®? Some editing systems 
purport to allow the programmer both points of view, but provide only 
one, predetermined, choice for the manipulation of each construct in 
the language.’ We introduce here the concept of a hybrid editor, which 
integrates the tree-navigation and menu-selection capabilities of a 
syntax-directed editor with the text-navigation and the string-entry 
capabilities of a conventional editor. Either of these views of a program 
can be taken at any stage during the expansion of the program in a 
hybrid editor. 

One of the features of the hybrid editors discussed herein is the fact 
that a complete editing system is automatically generated from a high- 
level description of a language. The concrete and abstract syntaxes of 
the language are both described by one context-free grammar; and 
both the syntax-directed editor, and the incremental parser for use 
with the editor, are generated from this grammar. 

In this paper, we outline the overall design of a hybrid editor 
generator and describe briefly the interface presented to a programmer 
by the hybrid editor. We define the meta-language in which the 
application language is specified for the generator, and describe the 
operations performed on the language specification in the process of 
generating the editor. Based on this design, a prototype hybrid editor 
generator, called HEG, has been built and is being tested. We describe 
possible extensions to the meta-language to describe program seman- 
tics, and the generation of the procedures to check those semantics 
during program construction. 


Il. BACKGROUND 


Although syntax-directed editing can be very helpful to a program- 
mer who is unfamiliar with the language he is using, one who knows 
the syntax of the language may easily become frustrated with the 
plethora of menu choices that must be made to “write” a program 
substructure as simple as an assignment statement. For this reason, 
allowing the programmer the option of giving the editor a string to 
parse and insert into the partially expanded program is a desirable 
feature to have in such a menu-driven editor. We define a hybrid 
editor to be a syntax-directed editor with the ability to parse and to 
integrate into the program tree a string given at any time during the 
expansion of a program. 

There have been attempts to generate syntax-directed editors au- 
tomatically for several programming languages from specifications of 
the syntactic (given by a context-free grammar)° and the semantic 
(given using attributes for the symbols in the grammar) aspects of a 
language.’° There is also a system in use that requires only the addition 
of a context-free grammar for any new language for which it is to 
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operate.* (In this latter system, detailed knowledge of the structure of 
the grammar seems to be required in order to use the editor with any 
grace.) Usually, however, such editors are built individually for each 
programming language. This is because (1) high-level programming 
languages have many context-sensitive semantic constraints that can- 
not be expressed by a context-free grammar, (2) programmers using 
high-level languages usually need more local context-sensitive aid than 
syntactic help, and (3) the external interface that such an editor is 
expected to provide depends heavily upon the language it supports. 

Little attention, however, has been paid to syntax-directed editors 
for special-purpose languages such as the input specification languages 
for Yet Another Compiler-Compiler (YACC)" and database interface 
languages such as HISEL.” Yet it is for these seldom used, but 
numerous, languages that a hybrid editor would be most useful. Users 
of such application programs often write their input in a file using a 
regular text editor and manually check conformity with the syntactic 
constraints imposed by the particular application program. Some of 
these application programs have parsers in their front ends to check 
the syntactic correctness of their input. (Sometimes these parsers are 
automatically generated from utilities such as YACC."!) Others just 
assume that their input is syntactically correct and abort when it is 
not. If, in place of the parser, a hybrid editor is available within the 
application program, the user can be guided by the program’s editor 
during input construction. 

A tool to automatically generate such editors for application lan- 
guages can be quite useful for several reasons: (1) A special-purpose 
application language, unlike a programming language, is likely to 
change more rapidly with an evolving application program. (2) Many 
application programs are not frequently used and hence their idiosyn- 
cratic syntactic constraints are likely to be forgotten. (3) The amount 
of syntactic help from a hybrid editor can be determined by the 
programmer. (4) The derivation tree built by the hybrid editor may be 
used by the application program to process its input in a more 
structured manner than is often possible when only a textual view of 
the input is available to the program. It is for these types of languages 
that HEG-generated hybrid editors, called AGEs (Automatically Gen- 
erated Editors), are most valuable. 


Il. THE EDITING PROCESS 


In a HEG-generated hybrid editor for a given language, a program 
is internally represented by a derivation tree of the program in that 
language, along with a symbol table containing the programmer- 
defined character strings. (See Appendix A for an example of a short 
session with an AGI.) When the programmer wishes to expand a 


SYNTAX-DIRECTED EDITORS 3207 


particular nonterminal in the tree by syntax-directed menu selection, 
and if there are two or more production rules for that nonterminal, 
then the AGE creates a one-item-at-a-time wraparound menu, on the 
bottom line of the screen, displaying the menu strings associated with 
those production rules. After a production rule is chosen for expansion, 
the internal node for the nonterminal is grown in such a way that the 
frontier of the subtree rooted at that internal node corresponds to the 
right-hand side of the production rule selected. If, however, the pro- 
grammer wishes to forego menu selection, he/she may expand a 
nonterminal by entering a string that is derivable from the nonter- 
minal. The editor will parse the string, build a subtree representing 
the string, and graft the subtree into the program tree at the node 
corresponding to the nonterminal. The string given by the programmer 
may be any combination of terminal strings and nonterminal names. 

Even though the internal representation of a program is a tree, both 
tree and text interfaces are provided to the programmer for navigation 
through the program. At any moment, a tree cursor navigating around 
the internal nodes of the tree is available. The programmer sees the 
cursor spanning the entire frontier of the subtree rooted at the tree- 
cursor node. He can move the tree cursor along the internal nodes of 
the tree (using the commands up [7], down [V], left-sibling under the 
same parent [<], right-sibling under the same parent [>], next internal 
node of the same type under the same parent [N], etc). Or, he can 
navigate textually (using the commands n for next word, b for back 
word, CR for next line, - for previous line) through the program. The 
commands ‘e’ and ‘u’ provide syntax-driven expansion and unexpan- 
sion facilities. Commands ‘p’ and ‘r’ allow parser-driven nonterminal 
expansion and file-reading facilities. 

When the programmer quits editing, both the text and the tree 
versions of the program are saved. The tree is saved by storing the 
leftmost derivation sequence of the production rules in the derivation 
tree and the symbol table. This sequence is used to reconstruct the 
tree for a later editing session on that program. The test is saved for 
possible use by other tools. 


IV. ARCHITECTURE OF HEG 


HEG produces a generalized table-driven syntax-directed editor, 
linked with a parser, for a given application language. The application 
language is specified by an AGE grammar in the meta-language 
described in the next section. A parser generator front end, named 
‘“GENPAR’, generates a YACC file and a LEX file’* from this speci- 
fication of a language. The parser generated from these two files is 
capable of parsing strings, which may contain nonterminal names, 
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starting from any nonterminal of the grammar. The grammar tables 
used by the editor, the parser, and the generic editing routines are 
then linked together to generate an AGE for the given language. 

Figure 1 shows the interactions between the different components 
of an AGE. 


V. THE META-LANGUAGE 


The “meta-language” for HEG is the grammar specification lan- 
guage in which the production rules of any context-free grammar are 
specified, along with the pretty-printing information required by the 
display utilities of the editor (e.g., indentation and color codes, if 
available) and the strings for the menu lists. As an example of the use 
of the language, 


query : (QUERY) 
| “select” list “where” conds 


° 
’ 


is anormal production rule. Here, (QUERY) is a “user-friendly” name 
for the nonterminal symbol ‘query’. The sequence of strings following 
the character ‘|’ specifies a production for the nonterminal ‘query’. 
Strings within double quotes in the production specification are ter- 
minal characters and others are nonterminals. The terminal character 
strings may also contain the following special characters denoting 
pretty-printing information: ‘\n’ for a new-line, ‘~’ for a blank char- 
acter, ‘\t’? for tabbing one position to the right and increasing the tab 
count by one for the starting positions of the subsequent lines, and 
‘\T’ for decreasing the tab count by one for the current and the 
subsequent lines. 


APPLICATION GRAMMAR g 


SCRIPT PARSER 
GENERATOR GENERATOR 


GRAMMAR INCREMENTAL 
TABLES PARSER 


COMMAND PARSER DISPLAY 
INTERPRETER | UTILITIES | UTILITIES 
TREE UTILITIES 
PROGRAM.TREE 


Fig. 1—Interactions between the components of an AGE. 
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Another example is 


list : ITEM_LIST 
| +item “,” 


This is an\iterative production rule. It specifies that the nonterminal, 
‘list’, can be expanded into one or more occurrences of the nonterminal, 
‘item’, separated by the terminal character string ‘,’. A “’, in place of 
the ‘+’ above, would denote zero or more occurrences of the nonter- 
minal. i 

There may be more than one production rule for a nonterminal; if 
so, all of the rules for the nonterminal must be specified in one rule 
set, each preceded by a ‘|’. For example, the following two sets of rules 
show several possible expansions for the corresponding nonterminals: 


clause : CLAUSE 
| field_name “~” op “~” constant 
| constant “~” op “~” field_name 
| field_name “~” op “~” field_name 


b] 
op : OPERATOR 
| o_ 
“1” 
| S=” 
| be-—” 
| cs” 
| we)? 


? 


For each production rule, the grammar specification must include 
an associated “expansion character”. This expansion character must 
be unique within the set of rules for that nonterminal. If there are two 
or more rules for the expansion of a nonterminal and if an instance of 
that nonterminal in the program is to be expanded, the programmer 
must indicate his choice by typing the expansion character associated 
with the desired rule. However, if the programmer wishes to examine 
all the choices for that nonterminal, the AGE creates a one-item-at- 
a-time wraparound menu on the bottom line of the screen, from which 
the programmer may choose a production. Therefore, the implementor 
(the person defining the grammar) must provide a “menu-item-string”, 
which is displayed in the menu for each production rule. These menu 
items should be suggestive of the associated production rule for better 
communication with the programmer. When a particular rule is chosen 
for the expansion, the right-hand side of that rule replaces the left- 
hand side nonterminal node in the program’s syntax tree. 
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To illustrate the provision of menu information in a grammar for 
HEG as above and to clarify the grammar specifications further, a 
complete set of rules for a database query language HISEL” is given 
below. Sentences (queries) in this language have the form: 


select x1.fld1,x2.fld2, --- where (CLAUSES) or (CLAUSES) .--- 


where ‘x’ is a cursor name, ‘fld’ is a field name, and (CLAUSES) is a 
conjunctive sequence of field conditions. The single characters follow- 
ing the menu items in the rules are the expansion characters. For the 
rule sets having just one production rule, meaningless menu-item 
strings (e.g., ‘xxx’) and expansion characters (e.g., ‘x’) are used for 
uniformity in the meta-syntax. Observe that there is no white space 
in the menu-item strings. 


query : (QUERY) 


| xxx x “select” list “\nwhere~” conds 
list : (ITEM_LIST) 
| xxx x + item “,” 
item : ITEM 
| xxx x cursor “.” field_name 
conds : (CONDITIONS) 
| xxx x *disjunct “\n~~~or~” 
disjunct : (CLAUSES) 
| xxx x +clause “~,~” 
clause : CLAUSE 
| field_op_const 1 field_name “~” op “~” constant 
| const_op_field 2 constant “~” op “~” field_name 


| field_op_field 3 field_name “~” op “~” field_name 


op : OPERATOR 
| equal — “+” 
| not_equal ! “!=” 
| greater_equal 1 “>=” 
| less_equal 2 “<=” 
| greater > “>” 
| less < “<” 


’ 


The left-hand side nonterminal of the first rule in the grammar 
specification (i.e., ‘query’, above) is the starting nonterminal. If no 
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production rule is available for a nonterminal (e.g., ‘field_name’), and 
that nonterminal appears during the derivation of a program, then it 
is assumed to expand into a character string to be provided by the 
programmer at the time of expansion. Such nonterminals are said to 
derive “identifiers”. A regular expression specification for any such 
nonterminal can be used to restrict the format of the identifier strings 
that a programmer may supply at the time of expansion. HEG provides 
a default specification for all nonterminals deriving identifiers and 
having no regular expression specification. 

As another example, the following is a grammar in the meta- 
language (meta-grammar) for the meta-language described in this 
section. In fact, one can invoke an AGE for this meta-language to 
enter a grammar specification for any user-defined language. 


gram : (AN_AGE GRAMMAR) 
| xxx x cfrules “\n%%\n” rerules 


cfrules : (CONTEXT_FREE RULES) 
| xxx x +ruleset “\n” 


ruleset : (A_RULE_SET) 
| xxx x nonterm-name “\t:~” user_name “\n” rules 
“\n;\n\T” 


piles : (RHS_OF_A RULE SET) 


| xxx x +rule “\n” 


’ 
rule : (A_RULE) 

| normal-rule n 
tklist 

| nonempty-rec-rule + “|~xxx~x~+” nonterm_name 

“~\ separator “\”” 

| empty-rec-rule * “|~xxx~x~*” nonterm_name “~\“” 
separator “\”” 


“Ee " Om 99 Om 99 


menu_string exp_char 


tklist : (A SEQ OF TOKENS) 


| xxx x +token “~” 


token : TOKEN 
| nonterminal-token n nonterm_name 
| terminal-token t “\“” terminal_str “\”” 


? 


rerules : (REGULAR_EXP_RULES) 
| xxx x *rerule “\n” 


’ 
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rerule : (REG_EXP_RULE) 

| xxx x nonterm-name “~~\~##!” reg expr 
%% 7 
exp_char ~##\[a-zA-Z0-9+* <>&] 


VI. THE PARSER GENERATOR FOR HEG 


The tools YACC and LEX are used to produce a parser and a 
scanner for an AGE system. The parser generator front end, GEN- 
PAR, takes, as input, the AGE grammar specification for the desired 
language and produces YACC and LEX specification files. (Note that 
the use of YACC implies that the input grammar must be LALR(1) 
in order to generate a parser for an AGE system.) 

The generated parser can be invoked to parse an input string 
representing the expansion of any nonterminal node in the program 
derivation tree. The string to be parsed can be any combination of 
terminal strings and nonterminal names that is derivable from the 
nonterminal at the “current” position of the editing cursor. 


6.1 The parsing of an input string 


When the programmer provides a string to be parsed, rather than 
making a menu selection, the AGE system opens a temporary file, 
writes the prefix $,NONTERMINAL_NAME$$ to the file (where 
NONTERMINAL_NAME represents the nonterminal from which the 
programmer’s input string is to be derived), and appends the input 
string. The parser is called to process the entire string contained in 
the file. The prefix is considered to be an integral part of the input 
string. If the portion of the string entered by the programmer cannot 
be derived from the nonterminal indicated by the prefix, the input will 
be considered syntactically incorrect. If a syntax error is found, the 
temporary file is saved in case the programmer should wish to edit the 
string and resubmit it to the parser. 

The parser will parse its input in a “backtracking bottom-up” 
fashion, constructing a syntax tree to represent the derivation of the 
input string. After the tree is constructed, a preorder traversal of the 
tree is performed, producing a list of production numbers representing 
the leftmost derivation of the input string. This list is then passed 
back to the controlling program in AGE. 

In an AGE, the YACC-generated parser is run subordinate to a 
“parser monitor” to provide a certain amount of parser backtracking, 
made necessary by the conflict resolution scheme of the LEX-gener- 
ated scanner. For such a scanner, the lexical tokens desired are 
specified by regular expressions. When two or more of the given regular 
expressions match equal-length segments of the input (starting at the 
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“current input position” of the scanner), and these are the longest 
matches possible, the scanner will select the regular expression that 
had been listed first (textually) in the input specification file as the 
“correct” match, and return the corresponding token number. If the 
parser is expecting one of the other possible matches at the current 
input position, the parser will find a “syntax error” where no error 
may, in fact, exist. 

The parsers produced by the YACC program are standard shift/ 
reduce (“bottom-up”) parsers. The required backtracking is accom- 
plished through the interaction of code at two levels of the parsing 
scheme. At the lowest level, the handling of multiple matches in the 
scanner is slightly modified by forcing the scanner to “reject” an 
“identifier” match, after linking the token number associated with the 
match into a “token map.” In this manner, all possible matches for an 
“identifier” are linked into the token map for the parse. At the highest 
level, the parser monitor runs the shift/reduce parser, handing the 
parser token numbers from either the scanner or the token map, 
depending upon the current state of the parse. Then, if the YACC- 
generated portion of the parser discovers a syntax error in the input, 
the monitor can rotate the last set of entries in the token map to 
(temporarily) “forget” the “preferred” token number for the last 
“choice position” in the input stream, allowing the use of another 
possible token number for that position.* In this manner, no syntact- 
ically correct input will be declared to contain a syntax error, and an 
incorrect input will only be rejected after trying the allowable combi- 
nations of token numbers for the “identifier” positions. 


6.2 What the parser generator does 


Several transformations must be applied to the AGE grammar to 
produce a grammar specification that is acceptable to YACC, and that 
will specify a parser offering the features and enforcing the constraints 
desired. All the transformations are performed and the YACC gram- 
mar specification is produced in a single pass over the input grammar. 


6.2.1 Starting the derivations from arbitrary nonterminals 


For every nonterminal in the AGE grammar, two additional produc- 
tions are generated. One such set of productions makes it possible to 


*Due to the relative simplicity of most application languages (in terms of permissible 
“identifier” combinations and possible token map complexity), the erroneous “identifier” 
token number will usually be in the last set of entries in the token map. So, in most 
cases, only the last one or two “identifiers” need to be retried (if the input really is 
syntactically correct). In all cases, the LALR(1) property of the grammar should aid in 
the isolation of groups of points in the “identifier” cross-product space that could not 
possibly be characteristic of a correct parse. The points in these groups, then, need not 
be considered individually during parsing. 
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start a derivation from any nonterminal in the original grammar, even 
though YACC will allow the specification of only one start symbol for 
a grammar. Each such added production is of the form: 


ppstart : AGE_TERM_ 20 clause 


where “ppstart” is a newly defined start symbol for the grammar, 
“clause” is a nonterminal in the original grammar, and 
“AGE_TERM_20” is the token returned by the scanner when it reads 
the prefix (e.g., “$$clause$$”) encoding the nonterminal from which 
the remainder of the input is to be derived. This type of production 
also allows the enforcement of the rule that the input string must be 
derivable from the nonterminal at the “current” node in the program 
tree. 


6.2.2 Use of nonterminal names 


The second set of productions generated for each nonterminal allows 
the acceptance of a “user-name” for a nonterminal, in place of a string 
that could be derived from that nonterminal, wherever the nonterminal 
may appear in an input sentence. Each of these productions is of the 
form: 


clause : AGE_TERM_22 


where “AGE_TERM_22” is the token returned by the scanner when it 
reads the user-name for the nonterminal on the left-hand side of the 
production (e.g., “clause”). 


6.2.3 Iteration in the grammars 


The iterative specifications in the AGE grammar must be trans- 
formed into explicit left recursions, with the separators appropriately 
treated. Since the most general form of iteration in AGE grammars is 
that of a list item repeated zero or more times, with a nonwhite-space 
separator, that case is considered here. The AGE grammar specifica- 
tion for such a list might be: 


Go 99 


clauses : *clause “, 


where “clause” is the nonterminal to be repeated in the list structure 
and “,” is the terminal separator to appear between list elements. 

The YACC specification corresponding to the above production 
would be: 


clauses : Xclause_2 0X 


where Xclause_2_0X is a new nonterminal encoding the nonterminal 
to be repeated as the list elements (e.g. “clause”), the number of the 
terminal string to be used as the list element separator (e.g., “2”), and 
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the minimum number of times the nonterminal must appear in the 
list (e.g., “O”). Such encoding permits the use of the list item nonter- 
minal, “clause,” as the repeated element in other lists with different 
separators and/or different minimal numbers of occurrences. 

To derive the required number of occurrences of “clause,” sets of 
productions of the following form must be added to the YACC gram- 
mar: 


Xclause_2 OX :e 
| Xclause_2 1X 


where “e” represents the empty string. The second new nonterminal, 
Xclause_2 1X, denotes one or more occurrences of “clause” separated 
by occurrences of terminal number 2. For this second new nonterminal, 
sets of productions of the following form are added to the YACC 
grammar: 


Xclause 2 1X _ : clause 
| Xclause_2 1X AGE_TERM_2 clause 


where “AGE TERM _2” denotes the required separator (“terminal 
number 2”) between elements of the list. 

Note that other iterative specifications are specializations of the 
above case. If the list must be nonempty, only the second new nonter- 
minal, with its associated productions, is generated. If the list element 
separator is not significant (i.e., if it is any form of white space), then 
no terminal is encoded in the new nonterminal(s) or included in any 
of the new productions. 


6.2.4 Treatment of identifiers 


In an AGE input grammar, there are no explicit productions for the 
derivation of character strings representing identifiers. Any nonter- 
minal that does not appear on the left-hand side of any production, in 
an AGE grammar specification, may produce an identifier. Since there 
are no such implied rules in a YACC grammar specification, GENPAR 
must add explicit productions to the grammar to allow the reduction 
of terminal identifier strings to appropriate nonterminals. These pro- 
ductions are of the form: 


cursr : AGE_IDENTIFIER_5 


where an identifier number (e.g., “5”) is specified only if the default 
lexical specification is overridden for the nonterminal on the left-hand 
side of the production (see below). 

To handle the lexical specification for each terminal identifier, the 
grammar designer has two choices. The designer may use GENPAR’s 
default specification, which allows identifiers to be any combination 
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of letters, digits, and the characters “-”, “_”, and “.”, which begins 
with a letter, and is not a reserved terminal string in the language. He 
may, instead, associate an arbitrary LEX specification with any non- 
terminal that may derive an identifier. The given specification would 
then be used to override the default identifier specification for that 
particular nonterminal. 


6.2.5 Treatment of white space 


In the generation process, all types of white space in the AGE 
grammar specifications of terminal strings are treated identically, and 
are considered insignificant to the specification of the generated 
grammar. For example, if a “PROGRAM” is specified as a “list of 
statements separated by new-lines”, the YACC-generated parser will 
accept any “list of statements separated by any white space (e.g., 
blanks, tabs, or new-lines)” as a “PROGRAM”. 


6.3 Generation of YACC actions 


If the input string to the parser is valid, AGE requires the output of 
the parser to be a list of rule numbers, symbol table indices, and list 
item occurrence counts representing the leftmost derivation of the 
input string. Since the YACC-generated parser is a shift/reduce parser, 
the order in which that parser uses the grammar productions will not 
represent a leftmost derivation of the input. To produce the proper 
input for the AGE monitor, the parser builds a tree to represent the 
derivation of its input string as it parses the input string, and, as part 
of the last production applied (reduction to the start symbol), performs 
the preorder traversal of the tree, generating the required list of 
numbers. 

The building of this tree is the main activity of the “actions” 
generated for each production in the YACC specification. To generate 
code to appropriately link nonterminal sub-trees, the positions of all 
the nonterminals in a given production rule must be saved as the 
production is processed by the generator. This is accomplished by 
counting the tokens on the right-hand side of the given production 
rule and stacking the position numbers that correspond to nontermin- 
als. Actions are then generated that call precoded subroutines that 
link the tree nodes together. These subroutines, and their associated 
data and type declarations, are constant across all grammars, and are 
included in the appropriate sections of each generated YACC grammar. 


6.4 The size of the processed grammar 


The growth of the grammar in the generation process is actually 
not as large as may be imagined. If n = the total number of nonter- 
minals and r = the number of iterative nonterminals in the original 
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AGE grammar, at most 3n + 3r< 6n productions, 2n — 1 nonterminals, 
and betweeen 2n + 1 and 3n — r terminals are added to the grammar 
before YACC generates the parser. 


VII. ATTRIBUTES AND SEMANTIC PROCESSING 


One possible extension to this work would involve the addition of 
attributes to the grammars described above. These attributes could 
allow a certain amount of semantic checking of user programs, in 
addition to allowing interpretation and/or code generation (for simple 
application languages) during program construction. The basic prin- 
ciples behind attribute grammars are described elsewhere;** we shall 
only discuss the required extensions to our meta-language and the 
generation of evaluation functions for the attributes. 


7.1 Extensions to the meta-language 


The meta-language described in Section V can be enhanced to 
include attributes and their evaluation routines in each production. 
An example of the use of the enhanced meta-language is shown below. 


clause : CLAUSE 
inherited: int temp_loc;; 
synthesized: int next_temp; 


en 9? m9 


| field_op_const 1 field_name op constant 


{ 
$0.next_temp = $0.temp_loc; 


| const_op_field 2 constant “~” op “~” field_name 


{ 
$0.next_temp = $0.temp_loc; 


| field_op_field 3 field_name “~” op “~” field_name 


$0.next_temp = $0.temp_loc + 2; 
In general, the attributes and their evaluation rules are specified in a 
pseudo-C language notation. We provide data typing facilities for the 
attribute variables.* The inherited attributes of a nonterminal and 
their data types are listed after the key word “inherited”, which appears 
after the user-name of that nonterminal. The synthesized attributes 


*As with the attributed grammars for programming languages, our experience in 
using attributed grammars for application languages suggests that facilities that include 
standard libraries of functions, user-defined types, and global attributes are needed. 
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of that nonterminal are then listed in a similar fashion. The attribute 
evaluation rules are listed after each production specification, within 
braces. “$0.attribute_name” in these rules refers to the correspond- 
ingly named attribute of the nonterminal in the left-hand side of the 
production. “$i.attribute_name” refers to the correspondingly named 
attribute of the ith nonterminal in the right-hand side of the produc- 
tion. 

For each inherited attribute of each nonterminal appearing in the 
right-hand side of a production rule, there must be an evaluation rule 
(a function call or an arithmetic expression) assigning a value to that 
attribute associated with that production rule. Similarly, for each 
synthesized attribute of the left-hand side nonterminal of a production 
rule, there must be a rule (a function call or an arithmetic expression) 
assigning a value to that attribute. These rules may take as arguments 
the inherited attributes of the left-side nonterminal and/or synthesized 
attributes of the nonterminals in the right-hand side. These rules must 
not, however, violate the properties defining L-attributedness™ of the 
grammar in order for the proposed evaluation scheme to work. 

For iterative production rules, the semantic specifications appear 
as: 


disjunct : (CLAUSES) 

inherited:; 

synthesized:; 

|xxx x +clause “~,~” 

{ 

int current_temp; 

init: { 
current_temp = 0; 

repeat: { 
$1.temp_loc = current_temp; 
current_temp = $1.next_temp; 


The routine following the key word “init:” is executed when this 
production rule is initially chosen. Then, for each instance of the 
nonterminal “clause” under the parent “disjunct”, the specification 
following the key word “repeat:” is used. The “$1” in the latter 
specification refers to the instance of the nonterminal “clause” whose 
attributes are being evaluated. These evaluation specifications must 
follow the same guidelines as those of the regular productions. Non- 
terminals deriving “identifiers” (i.e., those having no production rules) 
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are assumed to have only one synthesized attribute, “value”, whose 
value is the string entered by the programmer at the time of expansion. 


7.2 Generation of the attribute evaluators 


In any specification of an application language as above, each 
implementor-defined attribute evaluation function is associated with 
a specific production. For each distinct attribute in the grammar, the 
calls to these functions are collected into one generated evaluation 
function, which determines the appropriate implementor-defined func- 
tion to call, based on the rule that produced the associated nonter- 
minal. To make the required environment information accessible 
during the attribute evaluation, a pointer to the tree node at which 
the generated function is to be evaluated is passed as a parameter to 
the generated function. The evaluation functions for inherited attri- 
butes must be passed a pointer to the parent node of the nonterminal 
node with which the attribute is associated (i.e., inherited attributes 
are evaluated when the associated nonterminal appears on the right- 
hand side of a production). Synthesized attribute evaluation functions 
must be passed a pointer to the nonterminal node with which the 
attribute is associated (i.e., synthesized attributes are evaluated when 
the associated nonterminal is expanded). 

Each generated evaluation function for an inherited attribute con- 
tains a section of code for each production in which the associated 
nonterminal can appear on the right-hand side. Similarly, each gen- 
erated evaluation function for a synthesized attribute contains a 
section of code for each production in which the associated nonter- 
minal appears on the left-hand side. Within each of these sections of 
code is the code for the appropriate implementor-defined function, 
along with the function calls required to evaluate the actual parameters 
of the implementor’s function. The attributes are evaluated only as 
required, and the evaluation nesting is managed automatically by the 
C (target programming language) procedure calling mechanism. 


VIII. REMARKS 


The editor generator HEG, as described here (excluding the seman- 
tic analysis), currently exists and has been used for a variety of 
application languages. The novel aspects of this system include: (1) the 
ability to give a programmer the option to use menu selection or to 
enter strings containing terminal characters and nonterminal names 
at any stage during the expansion of a program in the user-language, 
and (2) the ability to generate a hybrid editor from a high-level 
specification of a user grammar. This system was an experiment to 
investigate only these aspects of editing. Various other related issues 
such as building a robust syntax-sensitive editor for higher-level 


3220 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1983 


languages (e.g., C language), experimenting with different ways of 
exhibiting the menus, and human-factors issues are being addressed 
separately.° 
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APPENDIX A 
An Editing Session With ‘Hisel.age’ 


$hisel.age 

program? test 

Initializing the tree/text for test.. 
Terminal Screen 


1) (QUERY) 
Program Tree 
internal 
cursor —2_, (nonterminal) 
node 


leaf 
Gaquery> (terminal) 
node 
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‘r <- user command - invisible 
file name? temp 


Assume that there is a file named ‘temp’ in the working directory 
and that it has the string “select (ITEM_ LIST) where 
(CLAUSES)”. AGE will read it, recognize the two nonterminal names, 
parse the string, create the subtree, and graft it to the root as shown 
below. 


2) select (ITEM_ LIST) 
where (CLAUSES) 


cursor ———2_».» (im 


(Leaves are not shown) 
ta 


This will add one more instance of CLAUSES. The screen after this 
command will be: 


3) select (ITEM_ LIST) 


where (CLAUSES) 
or (CLAUSES) 


© S cursor 
(5 Lea 

A] 

-e 


AGE will install a CLAUSE in place of (CLAUSES) after the first e 
command. Since there is a choice for the expansion corresponding to 
the next e command, the following menu items will appear one after 
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another if the user asks for a menu: 


expansion character (type ? for menu): ? <- user types this 
field_op-const 1 y(es), n(ext), q(uit)? n <- user types this 
const-op-field 2 y(es), n(ext), q(uit)? y 


The screen after this will be: 


4) select (ITEM_LIST) 
where (CLAUSES) 
or constant OPERATOR field_name 






OG 
>W 


This command will write the above text into test.hisel file and the 
tree representation in test.lmd file. 


cursor ~2-+ 


q 
AGE will then print: 
BYE! test hasn’t been fully expanded; call me back later. 


APPENDIX B 
AGE Command Summary 


Text-navigation commands: 


cr - a carriage return : go to the next line 
- - the character minus: go back one line 
space-bar or n: go to the next word 

b : go back one word 


Tree-navigation commands: 


: go to the parent of the current nonterminal 
: go to the first son of the current nonterminal 
: go to the right neighboring nonterminal under the same parent 
: go to the left neighboring nonterminal under the same parent 
: go the next unexpanded nonterminal 
: go backwards for the next unexpanded nonterminal 


PZAV< 
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Nonterminal expansion commands: 


p : parse text for the current nonterminal 

r: read text from a file and parse it for the current nonterminal 
e : expand the current nonterminal by menu selection 

a: append an instance of a ‘listy’ (i.e., iterative) nonterminal 

i: insert an instance of a ‘listy’ nonterminal 

d : delete the current instance of the ‘listy’ nonterminal 


Other commands: 


U : unexpand nonterminal 
W : write program 
q : quit editing with AGE 
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In this paper we analyze the performance of a preemptive priority queue. 
We give the model description in the context of a packet communication 
system where message sources, having different priorities, share a common 
communication channel. Each source generates, as an independent Poisson 
process, messages consisting of an arbitrarily distributed, random number of 
fixed-length packets. The channel server can only begin service at integer 
multiples of the packet transmission time (i.e., a time-slotted channel is 
assumed), and the server will preempt an ongoing message transmission at 
the next packet boundary whenever there is a message arrival from a higher- 
priority source. The average in-queue waiting time for each packet in any 
given source message and the average message delay are derived along with 
the corresponding moment-generating functions. Also, comparisons are made 
with the first-come first-served queueing discipline. 


I. INTRODUCTION 


We analyze the performance of a preemptive priority queueing 
system. To make clear at the outset the importance of the particular 
queueing system studied, we describe the system model in a packet 
communication context. Specifically, as Fig. 1 illustrates, a number of 
data sources share a single communication channel. Each source 
generates, according to a Poisson process, messages consisting of a 
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Fig. 1—Queueing model for a packet communication system. 


random number of fixed-length data packets. The packets comprising 
a message arrive.in bulk to be transmitted on the communication 
channel. For clarity, we view each source as having its own separate 
buffer to queue packets. Here, packets generated by the source wait 
for access to the channel. 

Packet transmissions on the channel are synchronized. More pre- 
cisely, time is divided into a sequence of fixed-length intervals or time 
slots. Each time slot is just large enough to allow the transmission of 
one packet, and packet transmissions must occur within time-slot 
boundaries. Hence, a packet arriving at the queue, at the very least, 
must wait until the start of the next time slot before its transmission 
can begin. 

Packets from any given source are served (i.e., transmitted) on a 
first-come first-serve basis. The sources, however, are assigned fixed 
priorities: the first source has the highest priority, the last has the 
lowest. At the start of each time slot, the first packet queued from the 
highest-priority source is served. That is, a packet at the head of the 
source k buffer is transmitted if and only if the buffers associated with 
sources 1 to k — 1 are empty. Hence, an ongoing packet transmission 
cannot be preempted; however, an ongoing message transmission will 
be preempted (at the next slot boundary) whenever there is a message 
arrival from a higher-priority source. 

Such a priority queueing discipline arises naturally in many packet 
communication systems. The channel might be a link in a data 
communication network, or may simply be a shared data bus. The use 
of priority may be required to give more urgent messages lower delay. 
For example, one might choose to give network control messages 
higher priority than interactive data messages, which in turn are given 
higher priority than long file transfers. In some situations, the priority 
structure is inherent in the mechanism for sharing the channel among 
the independent messages sources. This is the case with Datakit,’ 
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where the module (i.e., the interface between the source and the 
channel) with the highest address always wins the channel contention. 
It is also true of some slotted ring systems, where the physical order 
of sources along the ring imposes a priority ordering for access to the 
channel.’ 

The first results on queues with preemptive priority appear to be 
due to White and Christie.* Shortly thereafter, others studied the 
problem using different assumptions about the service time distribu- 
tion. A comprehensive treatment of some of the early work is given in 
Jaiswal,* and a more up-to-date, but less comprehensive, discussion 
may be found in Kleinrock.® The models examined, however, all 
assume an “asynchronous” server where service starting times and 
preemption times are not constrained to certain periodically recurring 
points. The use of a synchronous service facility in queueing models 
arises in the context of computer and data communication systems 
where there is a natural elementary unit of time such as the machine 
cycle of a processor, or the bit, byte, or packet transmission time on a 
channel. Many such models are reviewed, and references given, in 
Kobayashi and Konheim.° As we indicated, the model we have selected 
for study has applications to slotted ring systems, and it is here that 
one finds analysis of other models similar to ours. The model that 
seems to come closest is by Konheim and Meister,” where the main 
differences have to do with the arrival process. Konheim and Meister 
assume discrete arrivals (between slots) of packets, whereas we assume 
continuous arrivals of messages with each message containing an 
arbitrarily distributed number of packets. In this way, we are better 
able to examine message delays in the system. 

In this paper we analyze the performance of the above preemptive 
priority queueing system. We begin in Section II by summarizing the 
queueing model and introducing performance measures that are of 
interest. In Section III we derive the average in-queue waiting time 
for each packet in any given source message. From this result we easily 
obtain the average delay in transporting a message. The corresponding 
moment-generating functions are derived in the appendix. Finally, in 
Section IV, we compare performance with the First-Come First-Served 
(FCFS) queueing discipline. 


Il. QUEUEING MODEL 


In this section we briefly summarize the important points of the 
queueing model, and indicate the steady-state statistics that are of 
interest. Notation established here is used in the performance analysis 
that follows. 

The queueing system under study has the following properties: 

1. N sources of messages. 
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2. Priorities are assigned to sources in decreasing order (i.e., source 
k has higher priority than source k + 1,k = 1,2, ---, N-—1). 

3. Source k generates messages as an independent Poisson process 
with rate \, messages per time slot. Each such message has its length 
(in packets) selected independently from the distribution P,,,(-) with 
first and second moments, m, and m3, respectively. 

4, During busy periods, one packet is transmitted in each time slot 
and is always selected at the beginning of the time slot from the head 
of the highest-priority, nonempty source buffer. 

5. Each source buffer is assumed infinite, and packets enter and are 
removed from the buffer on a first-in first-out basis. 

We define W,, as the steady-state in-queue waiting time for the jth 
packet in a message from source k, k = 1, 2, ---, N. In addition, we 
define 


Pr = ArMk, 
where p;, is interpreted as the fraction of time the server is busy with 
source k packets. We also find it convenient to define 


k 
Oo, = Pi- 
1 


i= 


Other notation is introduced as needed in the analysis. 


lil. PERFORMANCE ANALYSIS 

We begin this section by deriving W,;, the average in-queue waiting 
time for the jth packet in a message from source k. Using this result 
we then obtain the average delay in transporting a message from 
source k. Included in the discussion are specific numerical examples 
to illustrate the derived results. 


3.1 Average waiting time analysis 


In Fig. 2, observe that we may express the waiting time for the jth 
packet in a source k message as 


ja 
Wi = Wa + dD wre (1) 
7=1 
where the incremental waiting time w;,is defined by 
Wree= Whar — Were. 
For a given message length, m;, the random variables {wi1, wy,..., 
Wk,m,-1} are independent and identically distributed. We observe that 


at the beginning of a slot during which a packet from source & is in 
service, there are no packets from sources 1 to k — 1 in the system. 
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Fig. 2—Waiting times for packet transmissions. 


Any messages that arrive from sources 1 to k — 1 while this source k 
packet is in service spawn a busy period of service, starting in the next 
slot, for sources 1 to k — 1. All such busy periods are independent and 
identically distributed, and hence so are the random variables {w,1, 
Wr, .--, Wkm,-1}- 

Note that the incremental waiting time w; , consists of one slot time 
to transmit the 7th packet in the source k message plus the time to 
serve all messages from sources 1 to k — 1 that arrive in the interval 
w,,. Hence, the average incremental waiting time W,, satisfies 


Wee = 1 + on-1Wre, 
from which we obtain 


1 


wy = ——— _. 
1 — on-1 


It then follows from (1) that the average in-queue waiting time for the 
jth packet in a source k message is given by 
— —_ i~] 
Wi = Wa t———. (2) 
1 — op-1 
Hence we are left with having to determine Wy, the average waiting 
time for the first packet in the message. 
By applying standard queueing arguments, we have 


3 1 k ow __ k-1 may 
Wan ==+ > DY pyWi + YX oiWn, (3) 
2 i=1 j=1 i=1 
where 
py & APr[m; = J}. (4) 


PREEMPTIVE PRIORITY QUEUE 3229 


The first term on the right-hand side of (3) is simply the average time 
between the arrival of a message and the start of the next slot. The 
second term is, by Little’s result, the average number of packets of 
equal or higher priority awaiting transmission at the moment the 
message arrives. Finally, the last term corresponds to the average 
number of packets of higher priority that arrive while the first packet 
in the source k message waits on queue. 
Now substituting (2) into (8) yields 


ae 1 k © —_ j = 1 k-1 Aan 
Wn ==+d D py (7, } | + > piWr. (5) 
2 i=1 j=1 Lo i=1 
Note from the definition of p;; in (4) that 
Dy py = Xi x Pr[m; = j] =A Y Y Pr[m: = 2) 
j=l j= J=1 ¢5j 
oo of 
=r ¥% Y Pr{m; = 2] =); z f Pr[m; = 2] 
7=1 j=1 
= AIM; = pi- (6) 
Similarly, we have that 
ey yee Nas — 
Y Gi - Vey = > (m7 — mi). (7) 
j=l 2 
Hence, using (6) and (7), we may rewrite (5) as 
1 k-1 Rk = 
5 + de 1+ Y Adm? — m;)/2(1 — oi-1) 
W.. = 2 i=1 i=l 
i (1 — ox) 


Solving recursively, we obtain 


k 
1+ ¥ Adm? — mi) 
| an i=1 
Tae Era. ” 


Finally, substituting (8) into (2) yields 


1+ A(m? — : 
; (m? — mi) ba 


2(1 = ox) (1 — Ge- as 1 — op-1 ; 


This concludes the derivation of the average in-queue waiting time 
Wy. The derivation of the moment- -generating function for W,,; (from 
which Wij can be obtained directly) is given in the appendix. 

To illustrate the performance, we begin by considering a homoge- 


Wij = (9) 
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neous system where \, = A, Mz = ™m, and mz = m? for k = 1, 2,---, 
N. For this case, (8) becomes 


1+ p (ym —1) 


k _ bd 
(EN) 


where p is the total system utilization (or load) defined by 


N 
p= 3 pi 
i=1 


Wu = (10) 





= Nrxm (for a homogeneous system). 


If we take N = 10 and assume a constant message length of 10 packets 
(i.e., ™ = 10, m? = 100), Fig. 3 is a plot of Wz: vs. p for k varying from 
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Fig. 3—Average first packet waiting time Wr vs. total load p. 


PREEMPTIVE PRIORITY QUEUE 3231 


1 to 10. Also shown in Fig. 3 is the average waiting time for the first- 
come first-served (FCFS) queueing discipline, which is derived in 
Section IV. Note from (10) that if we allow N — ™, then 


a 1 

Wu -_- 9 

1 + p(m?/m — 1) 
2(1 — p)? 


These two expressions represent, respectively, lower and upper bounds 
on the average first packet waiting time for all sources and arbitrary 
N. These bounds are plotted as dashed lines in Fig. 3. Finally, if we 
assume the same values for N, m, and m? as in Fig. 3, Fig. 4 is a plot 
of the average incremental waiting time W,, vs. p for k varying from 1 
to 10. Also shown in Fig. 4 is the upper bound 1/(1 — p) on Wz, valid 
for all parameter values. 


Wi ed 


3.2 Average message delay analysis 


We now consider the average message delay. Defining D,(m) as the 
average delay (in slots) from the arrival to the queue of an m-packet 


AVERAGE INCREMENTAL WAITING TIME wy, (SLOTS) 





TOTAL LOAD, p 


Fig. 4—Average incremental waiting time w;, vs. total load p. 
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message from source k until the end of its transmission, we have 
Di(m) = Wim + 1. 


Letting D, denote the average delay over all messages from source k, 
it follows, since W;,, is linear in m, that 


Dy = > Dy(m)Pm,(m) 


= Wim, + 1. (11) 


If we assume the same homogeneous system as represented in Figs. 3 
and 4, Fig. 5 is a plot of D, vs. p for k varying from 1 to 10. Also shown 
in Fig. 5 is an upper bound on D,, obtained from the upper bounds on 
Wr and W,,. Specifically, we have 


D, < Ltolmi/m - 1 oe 
~* - (1 = p)? (l-—p) ~° 


ao56 
go0 


N= 
m= 
m2 = 


AVERAGE MESSAGE DELAY D, (SLOTS) 





TOTAL LOAD, p 


Fig. 5—Average message delay D, vs. total load p. 
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which depends on m and m?, but is valid for all sources and arbitrary 
N. 

To complete this section, we consider a nonhomogeneous system 
consisting of 10 host computers and 300 terminals. The terminals and 
hosts correspond to the message sources and may be viewed as sharing 
a common time-slotted bus. There is a priority ordering of the termi- 
nals and hosts, with terminals having priority over hosts (i.e., the 
terminals correspond to sources 1 to 300 and the hosts correspond to 
sources 301 to 310). Each host is assumed to generate two types of 
traffic: host-to-host file transfers consisting of fixed-length 32-packet 
messages, and host-to-terminal messages with an average message 
length of 2 packets and a standard deviation of 1. Each terminal, on 
the other hand, only generates messages that are one packet in length 
and destined to a host. The message generation rates for each of the 
two types of host traffic are the same for all hosts. Similarly, all 
terminals generate messages at the same rate. The specific generation 
rate of each traffic type is such that the total load on the channel is 
divided as follows: 30 percent host-to-host, 60 percent host-to-termi- 
nal, and 10 percent terminal-to-host. The average delay performance 
for this system is plotted in Fig. 6. Observe that the results obtained 
allow us to distinguish between different types of traffic generated by 
the same source. In particular, in Fig. 6, the average message delay 
performance for the host-to-host and host-to-terminal traffic are 
shown separately. 

From the moment-generating function for D; derived in the appen- 
dix, one can obtain the second moment of the message delay. This, in 
turn, may be used to compute the message delay standard deviation. 
For hosts 1 and 10 (i.e., the two extremes), shown in Fig. 7 for the 
host-to-host messages and in Fig. 8 for the host-to-terminal messages, 
we see the mean delay and mean delay plus one, two, and three 
standard deviations (denoted by lo, 2c, and 3c). The second-moment- 
of-message delay depends on the first three moments of message 
length, and in Fig. 8 we set m® = 15. 


IV. COMPARISONS WITH FCFS 


In this section we compare the average delay performance of the 
priority queueing discipline studied in the previous section with that 
of the First-Come First-Served (FCFS) discipline. With the FCFS 
discipline, messages are served in the order in which they are gener- 
ated, independent of the source from which they originate. In this 
way, the FCFS discipline allocates the communication channel more 
fairly than does the priority discipline. For simplicity, we assume in 
the analysis a homogeneous system where \;, = A, ™M, = Mm, and mi, = 
m’ fork =1,2,---,N. 
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Fig. 6—Average message delay vs. total load p. 


The performance analysis of the FCFS queueing discipline is a 
special case of the results obtained for the priority discipline. Specifi- 
cally, we combine the N independent Poisson streams into a single 
Poisson stream (using the well-known result that the sum of inde- 
pendent Poisson processes is a Poisson process) with rate NX. From 
(10) we have that the average in-queue waiting time for a message 
generated by this combined (single) source is given by 
p(m?/m) , 1 
21—p) 2’ 


where again p = N)m is the total system utilization. The average 
message delay for the FCFS system is given by 


(12) 


Wecrs ae 


Drcrs = Wrcrs + ™ 
_ p(m?/m) 1. _ 
 21—p) 2 
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(13) 
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MESSAGE LENGTH = 
32 PACKETS (CONSTANT) 


HOST-TO-HOST MESSAGE DELAY (SLOTS) 





0 0.2 0.4 0.6 0.8 1.0 
TOTAL LOAD, p 


Fig. 7—Host-to-host message delay vs. total load p. 


Wecrs is plotted in Fig. 3 and Dycrs is plotted in Fig. 5 for the assumed 
. system parameter values. 

It is worth noting that the waiting time and delay results given by 
(12) and. (13), respectively, differ from those corresponding to the 
standard M/G/1 queueing system by the additional term 1/2. This 
added term results from the synchronous nature of the server and 
represents the average time an arriving message must wait before the 
start of the next time slot. 

We continue the priority and FCFS comparison by focusing on the 
unfairness issue. Specifically, we consider the ratio of the average 
message delay for source N to that of source 1, Dy/D;. Since all sources 
encounter the same average delay in the FCFS discipline, Dj/D, = 1. 
With the priority discipline, source N has the lowest priority and 
source 1 the highest; hence Dy/D, > 1 for p > 0. In particular, we have 
from (9) and (11) that 
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i ieee | 


a Ve ee 
Tose) 


where c2, is the squared coefficient of variation for the message-length 
distribution defined by © 


1 += ll + c2.)m — 1] 
; +1, (14) 








9 _. variance(m) 


a (mi)? 
Hence, for large N we have from (14) that 
Pp 


pe 


(1 + c2,)m 
424m. 


m{[1 
= Blt ruse +9] 








m WN 
and 
— 2 Ty —_ — — 
ee 1 + p[(1 + — 7 Hi a 5 ef 
20 - A) (1- N ) (1 - N ) 
—— 2 1 2 2 
m|(1+c,)o +{2-=}](1—-—p)+ia=(1 —- p) 
m m 
5 2(1 — p)? 
It follows then that 
2 ge 
7 1+ cnp + ze p) eet 
Dy _ 3(1 — p) 
D, 2+ (ch — 1p eh 
Sia for m> 1. 


Observe that for large N and fixed p, the increase in Dy/D, is 
approximately linear with the squared coefficient of variation c7,. In 
Fig. 9, the ratio Dy/D, is plotted against total utilization p for the 
FCFS and priority disciplines with c?, = 0 and 1. 

To complete this section, we compare the average delay performance 
of the FCFS discipline with the overall average delay of the priority — 
discipline. That is, we compare Drcrs as given by (13) to the quantity 
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Fig. 8—Host-to-terminal message delay vs. total load p. 


DA D,. 


ILIM2 


# 
Nim 


Using the expression for Dz given in (14), we obtain after some 
manipulation 


(1 + c2,)m 


= 1 
DS a ton 


[2m —1—- (1+ c?,)m)y + 1, 


where 


From this, one may show that 


m-1 





Drorgs = D for OS c = 


= 


n 
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RATIO Dy/D, 
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Fig. 9—Ratio Dy/Dy vs. total load p. 


and 





am = m—1 
Drcrs = D for C2, = i : 

Hence, for sufficiently large message-length coefficient of variation 
c2,, the overall average delay for the priority discipline is less than the 
average delay for the first-come first-served discipline. Of course, as 


we saw earlier, as c?, increases so does the relative unfairness of the 
priority discipline over the FCFS discipline. 


V. CONCLUSIONS 


We analyzed the performance of a preemptive priority queue, which 
has direct applications to packet communication systems. The main 
distinguishing feature of the system studied compared to the standard 
M/G/1 preemptive resume priority queue’ is that the server can only 


PREEMPTIVE PRIORITY QUEUE 3239 


begin serving a “customer” (and preemptions take place) at integer 
multiples of time corresponding to packet slot boundaries in the 
communication context. Mean value formulas for in-queueing waiting 
time and average message delay were derived and comparisons made 
to the FCFS queueing discipline. A derivation of the waiting time and 
delay moment-generating functions is given in the appendix. 
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APPENDIX 


Derivation of the Waiting Time and Message Delay Moment-Generating 
Functions 


As we introduced in Section II, W,,; is the steady-state in-queue 
waiting time for the jth packet in a message from source k, k = 1, 
2, ---, N. Its Moment-Generating Function (MGF), defined as 


Gw,(v) = Ele’™*] 


is derived in this appendix. From this result, the message delay MGF 
is easily obtained. The approach taken parallels in many respects the 
analysis given in Section III. 
We begin the derivation by examining the duration of a busy eeriod 
for sources 1 to k — 1, denoted by Y;. Such a busy period starting in a 
slot is initialized by one or more message arrivals from sources 1 to 
k — 1 in the previous slot (which contains no packet from sources 1 to 
k — 1). Let A, denote the total number of packets that arrive from 
sources 1 to k — 1 in this previous slot. For the ith packet in this set, 
we define the “sub-busy” period X;(z) to consist of the duration of the 
“virtual” busy period (i.e., as if 1 = A, = 1) initiated by the messages 
(if any) that arrive from sources 1 to k — 1 while this 7th packet is in 
service. In other words, we conceptually reorder the priorities so that 
each of the A; packets, and the sub-busy period it spawns, is served in 
turn. This does not change Y; and is a standard approach to busy- 
period analysis. 
Due to the memoryless property of the arrival process, the sub-busy 
_ period random variables X;(i), 1 = 1, 2, --- , A,, are independent and 
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identically distributed (vid). In addition, note that Y, has the same 
distribution as the generic random variable X;,, and satisifies the 
relation 


yA Ape y Xi). (15) 


The Probability-Generating Function (PGF) for the discrete random 
variable X; is defined as 


by,(z) = E[z**] = > 2'Pr[X, = i]. 
i=0 


Using (15) and the result that X;, is distributed as Y;, we obtain 
bx(z) = Elz” 


Ak 
Ant Y Xi(i) 
=EK\z ™ 


= E[(z@x,(z))*"] 


= #,,(zPx,(z)), (16) 


where #,,(z) is the PGF for the random variable A,. 

Now, A, is equal to the total number of packets arriving from 
sources 1 to k — 1 in one time slot. Recall that each source 1 generates 
messages as an independent Poisson process with rate \; and each 
such message has its length selected independently from the distri- 
bution P,,,,(-), whose PGF we denote by ee It follows then that 





; k-1 ie 
&,,(z) = I] 13% 2 - [®n cor} 


i=1 Le 


eS saa 
= J] etn) | (17) 


i=1 


Hence, substituting (17) into (16), we obtain 


k-1 
Px, (z) a Il ener ne (18) 


i=1 


As we shall see, x,(z), the PGF for the duration of a busy period for 
sources 1 to k — 1, plays an important role in the derivation of 
Gy,{(v), the waiting time MGF. 

Returning to eq. (1) in Section III, we note that ae is the sum of 
W,, and the j — 1 wd random variables wy, Wye, «++ » Wpj-1- Observe, 
however, that w;,, 7= 1, 2, ---,j)—1, is distibuied: as X, + 1. That 
is, Wz is composed of the service time for the 7th packet plus the busy 
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period for sources 1 to k — 1 initiated during this service time. In 
addition, it follows that the waiting time for the first packet in a 
source k message, W,:, is statistically independent of w,z,, 7 = 1, 2, 
--+,jJ— 1. Hence we may write 


Gwylv) = Gry (v) -Le’bx,(e”)P. (19) 


This leaves us with having to determine the MGF for W,,. 

Let us for the moment consider the time-dependent behavior for 
the number of packets queued from sources 1 to k. For time slot n, we 
let Q,(n) denote the number of such packets queued just after the 
beginning of the slot and, to be consistent with our previous notation, 
we let Az+i(nm) denote the number of packets that arrive from sources 
1 to k during the nth slot. It follows that 


Qi(n + 1) = [Q:(n) + Ansi(n) — 1)", (20) 


where 
+_ Je if €20 

Id -{s if «<0. 

From (20) we obtain the relation 
Elz 2] = Eze An) (21) 
which may be rewritten as 
Doene(Z) = E[z'eer)+4vr"] 
= Pr[Q,(7) + Ans: = 0) + Pr[Qz(n) + Ans > 0] 
ZB [zee™+4en | Qp(n) + Ans > 0] 


= Pr{Q(n) = OlPrlAsn = 0] 
+27 D 2)Pr{Qun) + Ana = 


= Pr[Q.(n) = O)]Pr[Az+ = 0] 
+ 2 Eze +t4en] — Pr[Q,(n) = O|Pr[Azsi = 0} 
= Pr[Q,(n) = O}Pr[Az+1 = 0](1 — 27°) 
+ 27'Paun)(Z)Pa,,,(2), (22) 


where we have used the fact that Q,(n) and A;4; are statistically 
independent. Taking the limit as n — © on both sides of (22) (the 
limits exist for a, < 1) yields 


bo,(z) = Pr[Q, = O]Pr[Ansi = O)(1 — 271) + 27'e,(z)Ba,,,(z), (23) 
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where Q, represents the steady-state number of packets queued from 
sources 1 to k at the beginning of a slot. Rearranging the terms in 
(23), we obtain 


Pr[Q, = O]Pr[Aa+ = O)(2 — 1) 


® = 
al2) mE HIG (24) 
Taking the limit as z > 1 on both sides of (24) yields 
0 
Pr[Q, = O}PrlAz+1 = 0] =l1- a2 4,,, (2) 
z=1 
=l]- OR. 
Hence, using this result and (17), (24) may be rewritten as 
1- —1 
Bo,(z) = At eae 1). (25) 


z— TI evltm@ - ul 
i=1 
Now consider the end of the time slot during which a source k 
message is generated. The number of packets of higher or equal priority 
that are queued and must be transmitted before the first packet in 
this source k message is given by 


Q, + Ay + Bi; 


where Q; is the number of queued packets from sources 1 to k just 
after the beginning of the slot, A, is the number of packets from 
sources 1 to k — 1 that arrive during the slot, and the new random 
variable, B,, represents the number of packets from source k that 
arrive during the slot prior to the generation of the source k message 
in question. The ith packet in this set of (Q, + A, + B,) packets 
initiates a sub-busy period of duration X;(i). Hence we may write 
(Q,+A,+B,) 


Wa=U+ YY [1+ X,(i)], (26) 


i=0 


where U is a random variable, uniformly distributed over one slot 
time, that represents the time from when the source k message is 
generated until the start of the next slot. 

From (26) we may write 


Efe’“"|U = u, Qr = qr, Ar = Gx, By = by] = e”[e’Bx,(e”)]TH™, 


Removing the conditioning on the independent random varaibles Q, 
and A, yields 


E[a# | U =u, By = by] 
= a [abx,(a)]bo,(abx,(a))Bs,(abx,(a)), (27) 
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where, for simplicity, we have substituted a for e’. Now, using the 
same approach as we did with A,, we obtain 


E[z3*|U = ul] = eO-MlFn, 2-0), 
Thus, removing the conditioning on B; in (27) yields 
Bla | U = ul = ateWOWlPm(2fo-M. o,(ax,(a)) Pa (ora) 
© [el nlotnf-N}9 (ab,(a)) Bay, (aa). 
Now, removing the conditioning on U, we obtain 
Gw,(v) = Gulv - TEC) 1} ba,(aPx,(a)) Pa,, (a Px, (a)), 


where 
, 1 
Gy(v) = { e“du = . [e” — 1]. (28) 
0 3 


Finally using (19), we obtain 
Gy,(v) = Gulv — M[Pm(abx,(a)) — 1} 
-bq,(abx,(a)) B4,,,(aPx,(a)) -[aebx,(a)]}, 


where a = e’, Gy(v) is given by (28), g,(z) is given by (25), ®,4,,,(z) is 
given by (17), and &x,(z) is given by (18). 

The delay in transmitting a source k message of length m, D;(m), is 
given by 


D,(m) = Wim + 1. 
Hence, the MGF for D,(m) is given by . 
Gp,im(v) = e’Gw,,,(v). 


It follows that the MGF for D,, the delay in transmitting a randomly 
selected source k message, is given. by 


Gp,(v) = Gw,,(v) Pm,(e’Px,(e"))/Px,(e"). 
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